Mdadm.conf disappeared - tried to recreate

I had to reboot my system during a reforming process from RAID5 to RAID6 and landed in the rescue system.

Examination showed, that my RAID-array hadn’t been recognized and that there was no mdadm.conf to be found on the entire system for some reason.

mdadm had no problems finding the array with a scan and is reforming again now.
I tried rebuilding it on my own but I can only find articles on how to add an array, not on how to rebuild from scratch.

Right now I have this

DEVICE partitions
ARRAY /dev/md/md0 metadata=1.2 spares=1 name=<FQDN>:md0 UUID=<UUID>

Can someone tell me if I did it correctly? the <> are placeholders, the information in them is verified, but I don’t want to publicise it.

P.S.: I created the array on the HDDs without creating partitions first (accident with Cockpit and not reversible afterwards since there was already data on it…)

The array on the HDDs without partitions is not an issue. You can then partition the array itself and format the file systems using gparted, gdisk, parted, and others for managing the partitions.

What is the output of cat /proc/mdstat? You should see something like

$ cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] 
md127 : active raid6 sdd[3] sde[1] sdb[4] sdc[5]
      5766400000 blocks super 1.2 level 6, 512k chunk, algorithm 2 [4/4] [UUUU]
      bitmap: 5/22 pages [20KB], 65536KB chunk

unused devices: <none>

ok, it seems my actual problem got lost.

  1. The array is still reforming (80% by now)
  2. After reboot I can start it with mdadm --run /dev/md0
  3. after 2. it picks the reforming up where it left off and I can mount the partition on the drive

The problem is, that the array gets assembled on reboot, but doesn’t start

[root@<FQDN>]# sudo mdadm --detail /dev/md0
           Version : 1.2
        Raid Level : raid6
     Total Devices : 7
       Persistence : Superblock is persistent

             State : inactive
   Working Devices : 7

     Delta Devices : 1, (-1->0)
         New Level : raid6
        New Layout : left-symmetric
     New Chunksize : 512K

              Name :<FQDN>:md0  (local to host
              UUID : 50f387c6:39b2da4f:e2b3f7b3:eeae3d82
            Events : 197230

    Number   Major   Minor   RaidDevice

       -       8       64        -        /dev/sde
       -       8       32        -        /dev/sdc
       -       8        0        -        /dev/sda
       -       8      112        -        /dev/sdh
       -       8       80        -        /dev/sdf
       -       8       48        -        /dev/sdd
       -       8       96        -        /dev/sdg

I found a million tutorials for Ubuntu and Arch, but they all use system-tools Fedora doesn’t have (and I don’t really know what they actually do).

I need help getting my system to properly start my Array on Bootup, so I can mount its partition via /etc/fstab.

I have services actually depending on the data, so I can’t have the system boot without it.

If I just do this:

mdadm --stop /dev/md0
mdadm --assemble /dev/md0

my console hangs itself and I have to ssh in again (or change to another tty)
BUT if I do

mdadm --stop /dev/md0
mdadm --assemble --scan

My Array gets initialised and I get:

[root@serena ~]# mdadm --assemble --scan ^C
[root@serena ~]# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid6 sda[0] sde[6] sdg[7] sdh[5] sdf[3] sdd[2] sdc[1]
      62502989824 blocks super 1.2 level 6, 512k chunk, algorithm 18 [7/6] [UUUUU_U]
      [================>....]  reshape = 84.4% (13196004352/15625747456) finish=321.5min speed=125952K/sec
      bitmap: 10/117 pages [40KB], 65536KB chunk

unused devices: <none>


[root@serena ~]# mdadm --detail /dev/md0
           Version : 1.2
     Creation Time : Wed Jan  5 18:21:28 2022
        Raid Level : raid6
        Array Size : 62502989824 (58.21 TiB 64.00 TB)
     Used Dev Size : 15625747456 (14.55 TiB 16.00 TB)
      Raid Devices : 7
     Total Devices : 7
       Persistence : Superblock is persistent

     Intent Bitmap : Internal

       Update Time : Tue Jan 11 16:40:01 2022
             State : clean, degraded, reshaping
    Active Devices : 6
   Working Devices : 7
    Failed Devices : 0
     Spare Devices : 1

            Layout : left-symmetric-6
        Chunk Size : 512K

Consistency Policy : bitmap

    Reshape Status : 84% complete
     Delta Devices : 1, (6->7)
        New Layout : left-symmetric

              Name :  (local to host
              UUID : 50f387c6:39b2da4f:e2b3f7b3:eeae3d82
            Events : 197265

    Number   Major   Minor   RaidDevice State
       0       8        0        0      active sync   /dev/sda
       1       8       32        1      active sync   /dev/sdc
       2       8       48        2      active sync   /dev/sdd
       3       8       80        3      active sync   /dev/sdf
       5       8      112        4      active sync   /dev/sdh
       7       8       96        5      spare rebuilding   /dev/sdg
       6       8       64        6      active sync   /dev/sde

That output clearly shows that RaidDevice 5 (/dev/sdg) is still rebuilding. You should not try hurrying it, but instead wait patiently until the array is fully rebuilt. The more you interrupt it by shutting down the longer the rebuild will take since it has to also verify the current condition each time it is powered back up and before the rebuild can restart.

I had a raid 5 with 3 disks and I added a 4th disk to convert it to raid 6. The rebuild took almost 3 days running 24/7. My drives were all 3 TB.

A rebuild involves spreading the data across the full span of drives so it is constantly reading from the other drives and writing to the last drive as well as re-configuring the data spread across the first drives and all parity stripes have to be reconfigured as well. Basically that type rebuild may have to touch and reposition every bit of data on all the drives which can be very time consuming.

I don’t see anything wrong with the array data you posted except the rebuild with one drive that is in progress.

There is one thing that seems wrong with your commands
you posted mdadm --assemble /dev/md0 which hung.
According to the man page for mdadm which says

              Assemble the components of a previously created array into an active array.  Components can be explicitly given or
              can be searched for.  mdadm checks that the components do form a bona fide array, and can, on request, fiddle  su‐
              perblock information so as to assemble a faulty array.

And I read that to mean that you either must specify the devices to assemble, or that you need to use --scan as you did.

I suspect that as soon as the rebuild is complete it will automatically activate and then can be mounted as normal.
/proc/mdstat showed that the rebuild should complete approx 5 1/2 hours after you did that command to display it, so it will not be too much longer to finish.

Hmmm, ok - I hope it will.

It is weird though.
When it was initially building as RAID5 (with 5 disks), I had to reboot once and it just came back up.

I actually wasn’t trying to rush it.
The server crashed and came up in rescue-mode (because of the mentioned problems).

Sadly that can happen while reshaping arrays with 16TB drives.
The speed isn’t accurate though, since that is without load.

Like I mentioned, the array is in use (or at least the 5 disk forming the previous RAID 5 are - not quite sure how that works).

Ah Ha!
Now I see why the issues.
Raid 5 with 5 disks being converted to Raid 6 with 7 disks —
Since Raid 6 can only tolerate 2 drives failing at the same time and Raid 5 can only tolerate 1 drive failing it should have been done as 2 steps. First adding one disk and converting to Raid 6. Then when that rebuild was done add the last drive and rebuild with 7 drives. Adding 2 drives at the same time seems like it could remove all raid fail-safe features.

It would appear that since you added 2 drives at the same time as doing the conversion the system saw it as 2 drives failed and had to do a total rebuild from the initial 5 drives at the same time as it did the conversion.

Long story short – have patience and let the array finish rebuilding. 16 TB drives will take a long time as you already know.

You are able to manually activate the array, and I assume you were able to start the services that depend on that array so things do not appear to be critical. Even if the system is not functional you cannot hurry the rebuild, so patience is the only thing I can suggest.

1 Like

Ok, so I should remember not do it like this again.
I can live without the security for a few more hours I think (The drives have undera month under their belt and are all past the 5 day threshold of usual infancy-death - so I don’t expect failures any time soon)

And yeah, I was able to manually let the system automatically assemble the array.
What perplexed me was that it is able to automatically detect that there is an array that is reforming but didn’t automatically start it.

Maybe the fact that it started as read-only when I just did the --run should have been a hint :slight_smile:

If this inconvenience saves me from a whole 16TB reform process and everything works after this I am still happy though.

I will check if it works after the rebuild (when its convenient - since it involves a reboot) and if it does, I will mark it as the answer.
I probably will be soon though since hot-plug seems to be broken also and I bought an SSD to replace the system drive which is reallocating too much sectors for my taste (I hope that’ll go smoother than the RAID-reform - it will be the first time I do this without reinstalling :slight_smile: )