Trying to setup full Raid1 installation

Hi,

I’m always trying to install FCOS in a full raid1 configuration on a Bare metal system.

I’ll try to be as clear as possible on my actual progress :slightly_smiling_face:

  1. Booting from live usb iso
  2. Added new user / added user to sudo group and allow sshd password login for better access via ssh (optional :wink: )
  3. Create a raid array on full device
    sudo mdadm --create /dev/md127 --level=1 --metadata=0.90 --raid-devices=2 /dev/sda /dev/sdb
  4. run sudo coreos-installer install /dev/md127 --ignition /tmp/config.ign
  5. run manually that is done after first reboot : growpart /dev/md127 4
  6. mount 4th partition (/root) in /tmp/md4 and run : sudo xfs_growfs /tmp/md4
  7. mount 1st partition (/boot) in /tmp/md1 and update /tmp/md1/grub2/grub.cfg by adding in top of file : ok, it’s hardcoded, will try to list all boot labeled devices an choose one with priority on raid later …
insmod mdraid1x
insmod mdraid09

set pager=1
search --label boot --set boot
#set root=$boot
set root=(md/md127,gpt1)
  1. create /tmp/md4/ostree/deploy/fedora-coreos/deploy/{some very long ID :p}/etc/mdadm.conf with the result of sudo mdadm --detail --scan

  2. lsblk -M -o NAME,PARTLABEL,MOUNTPOINT

    NAME      PARTLABEL  MOUNTPOINT
    loop0                /sysroot
,-> sda
'-> sdb
 `--md127
    |-md127p1 boot       /tmp/md1
    |-md127p2 EFI-SYSTEM
    |-md127p3 BIOS-BOOT
    `-md127p4 root       /tmp/md4
    sdc
    `-sdc1
    sdd
    `-sdd1

A this point everything looks right but FCOS still booting on 1 drive, e.g. /dev/sda and mdmonitor.service return me an error …

Can you help me to “hack” the end of the process please ?
I need help to tell (as my knowledge is not enough) initramfs ? dracut ? something else ? how to find and use md127p4 as root … it’s just here, already used by grub … So near and yet so far ! :smiley:

I know there’s a discussion on github about handling raid for /root

I’m not an expert at all but if I well understand thinks, you should handle some ignitions parts like raid in coreos-installer and not after some kind of “hardware” decision like partitioning by running ignition after hdd setup is done …

Thanks for reading :wink:

I think the problem is probably that your /etc/mdadm.conf isn’t in the initramfs. You might have to add some kernel args to make the raid array get assembled early on. AFAIK we haven’t paved this path just yet so (as you know) you’re operating a bit blind :slight_smile:.

Full support for root-on-RAID will be fixed by https://github.com/coreos/fedora-coreos-config/pull/503.

I think the final bit you’re missing for this hack is to add rd.md.uuid=... root=UUID=... on the kernel cmdline.

Right. It looks like he is doing a raid1 on the entire disk (not just for the root filesystem), though. I don’t know if we have any plans to address that, do we?

I think we should. Filed as metal: Support redundant bootable disks · Issue #581 · coreos/fedora-coreos-tracker · GitHub

Hi

Just come back to my installation process, but in a different way …
I have a spare PC so i decide to install FCOS on it, via pxe, on one hdd, everything is ok for that…
But my question stay actual

On the PC with raid1 Hdd, I want to run a “diskless” pxe FCOS with this ignition file :

variant: fcos
version: 1.1.0

passwd:
  users:
    - name: core
      ssh_authorized_keys:
        - ssh-rsa blablabla

storage:
  raid:
  - name: MediaRaid
    level: raid1
    devices:
    - /dev/disk/by-id/ata-WDC_WD10SPZX-80Z10T2_WD-WXL1A49KPYFD
    - /dev/disk/by-id/ata-WDC_WD10SPZX-80Z10T2_WD-WX41A49H9FT4

  disks:
  - device: /dev/md/MediaRaid
    partitions:
    - number: 1
      should_exist: true
      label: RaidPart

  filesystems: 
    - path: /media
      device: /dev/disk/by-label/RaidPart
      format: xfs
      label: Media
      wipe_filesystem: false
      with_mount_unit: true

now my concern is about Raid1 on entire disk, it looks like FCOS try to create partitions on “/dev/md/MediaRaid” before creating “/dev/md/MediaRaid” raid device … and fail booting with error :

disks: createPartitions op(1): [failed] waiting for device [/dev/md/MediaRaid]: device dev-MediaRaid.device timeout

hi @nemric … just wondering if you ever found a solution to this problem?
I am having, what sounds like a similar issue, my device unit is timing out

@bn Ignition can’t partition a RAID volume. Instead, create partitions first, then put RAID volumes on those, then put filesystems in those.

If you’re looking to mirror your boot disk, there’s special Butane syntax for that; see the docs for details.

1 Like

Hi,

It’s working now, here is my butane config :

storage:
  raid:
    - name: Raid
      level: mirror
      devices:
        - /dev/disk/by-id/ata-WDC_WD10SPZX-80Z10T2_WD-WX41A49H9FT4
        - /dev/disk/by-id/ata-WDC_WD10SPZX-80Z10T2_WD-WXL1A49KPYFD
      options:
        - --metadata=1.2
        - --assume-clean
        - --uuid=7ec8d4df:823fae52:c55d5e56:e773b281

  filesystems: 
    - path: /var
      device: /dev/md/Raid
      format: xfs
      label: Var
      wipe_filesystem: false
      with_mount_unit: true

I don’t remember exactly how I did it, but, for the first boot you can remove --assume-clean and UUID mdadm options and wait for the full build of the raid array

–assume-clean is to prevent the full process of check/build/rebuild/ … at each reboot (remember I’m on a diskless (live PXE) environment)
–UUID … ?? … perhaps because a new uuid was set after reboots, can’t say ^^

Perhaps you will have to change wipe_filesystem to true for the initialization … can’t remember

cheers for this information, interesting about the --assume-clean

My issue turned out to be data left on the disk from prev raid attempts, when that was removed the CoreOS install went like a breeze

thanks for the help

If it was fixed, what do you need to do now to enable RAID1 ?