Md raid usage problem

hello all :),
I am using coreos and am having trouble removing a disk from a raid1 array.
As they say, “terminal output paints a thousand words”, so here is mine to demo the problem:

Fedora CoreOS 37.20230303.3.0
[core@baller ~]$ sudo bash
[root@baller core]# mdadm --detail /dev/md127
/dev/md127:
           Version : 1.2
     Creation Time : Sat Feb  5 21:07:27 2022
        Raid Level : raid1
        Array Size : 9766303680 (9.10 TiB 10.00 TB)
     Used Dev Size : 9766303680 (9.10 TiB 10.00 TB)
      Raid Devices : 2
     Total Devices : 2
       Persistence : Superblock is persistent

     Intent Bitmap : Internal

       Update Time : Wed Mar 29 18:02:53 2023
             State : clean
    Active Devices : 2
   Working Devices : 2
    Failed Devices : 0
     Spare Devices : 0

Consistency Policy : bitmap

              Name : any:backup
              UUID : 555c5555:f37db731:bcf64d0c:94f60b41
            Events : 482208

    Number   Major   Minor   RaidDevice State
       2       8       49        0      active sync   /dev/sdd1
       1       8        1        1      active sync   /dev/sda1



[root@baller core]# umount /dev/md127
[root@baller core]# mdadm --stop /dev/md127
mdadm: stopped /dev/md127
[root@baller core]# mdadm --fail /dev/sdd --remove /dev/sdd
mdadm: /dev/sdd does not appear to be an md device
[root@baller core]# mdadm --fail /dev/sdd1 --remove /dev/sdd1
mdadm: /dev/sdd1 does not appear to be an md device

As you can see mdadm reports the device is not raid device but before (–detail) it reports it as part of the array.
I have not done this before so I am unsure if this is the correct thing to do… or if there is another problem.

Please offer any advice you can.

Many thanks,
-bn

I believe the raid array needs to be running for --fail to be applied, also you should --fail sdd. Quothe the man page:

–fail, -f
This allows the hot-plug system to remove devices that have fully disappeared from the kernel. It will first fail and then remove the device from any array it belongs to. The device name given should be a kernel device name such as “sda”, not a name in /dev.

thanks for the advice :+1:

sadly this does change how mdadm see the device

mdadm --detail /dev/md127
/dev/md127:
           Version : 1.2
     Creation Time : Sat Feb  5 21:07:27 2022
        Raid Level : raid1
        Array Size : 9766303680 (9.10 TiB 10.00 TB)
     Used Dev Size : 9766303680 (9.10 TiB 10.00 TB)
      Raid Devices : 2
     Total Devices : 2
       Persistence : Superblock is persistent

     Intent Bitmap : Internal

       Update Time : Wed Mar 29 18:59:01 2023
             State : clean
    Active Devices : 2
   Working Devices : 2
    Failed Devices : 0
     Spare Devices : 0

Consistency Policy : bitmap

              Name : any:backup
              UUID : 650c5059:f37db731:bcf64d0c:94f60b41
            Events : 482208

    Number   Major   Minor   RaidDevice State
       2       8       49        0      active sync   /dev/sdd1
       1       8        1        1      active sync   /dev/sda1
[root@baller core]# mdadm --fail /dev/sdd
mdadm: /dev/sdd does not appear to be an md device
[root@baller core]# mdadm --fail /dev/sdd1
mdadm: /dev/sdd1 does not appear to be an md device

i

Since mdadm shows /dev/sdd1 as a raid member then the fail command should address the member in that way.

Most users define a raid array using the full devices, but it is perfectly fine to do so with a partition as well.

for @bugthing
A quick search with google gives me these

I have not tested that since I do not use raid 1, but both seem to be pretty standard.

1 Like

YES!.. thanks to that second link I have made progress… my problem was that I formed the --fail command wrong, it should be

mdadm --fail /dev/md127 /dev/sdd1

Seems so many posts on this topic say about stopping the raid first and dont seem to include the raid device in the fail command… but all that I needed was the above.

Thanks for reply, much appreciated :heart_decoration:

2 Likes