Trying to setup full Raid1 installation

Hi,

I’m always trying to install FCOS in a full raid1 configuration on a Bare metal system.

I’ll try to be as clear as possible on my actual progress :slightly_smiling_face:

  1. Booting from live usb iso
  2. Added new user / added user to sudo group and allow sshd password login for better access via ssh (optional :wink: )
  3. Create a raid array on full device
    sudo mdadm --create /dev/md127 --level=1 --metadata=0.90 --raid-devices=2 /dev/sda /dev/sdb
  4. run sudo coreos-installer install /dev/md127 --ignition /tmp/config.ign
  5. run manually that is done after first reboot : growpart /dev/md127 4
  6. mount 4th partition (/root) in /tmp/md4 and run : sudo xfs_growfs /tmp/md4
  7. mount 1st partition (/boot) in /tmp/md1 and update /tmp/md1/grub2/grub.cfg by adding in top of file : ok, it’s hardcoded, will try to list all boot labeled devices an choose one with priority on raid later …
insmod mdraid1x
insmod mdraid09

set pager=1
search --label boot --set boot
#set root=$boot
set root=(md/md127,gpt1)
  1. create /tmp/md4/ostree/deploy/fedora-coreos/deploy/{some very long ID :p}/etc/mdadm.conf with the result of sudo mdadm --detail --scan

  2. lsblk -M -o NAME,PARTLABEL,MOUNTPOINT

    NAME      PARTLABEL  MOUNTPOINT
    loop0                /sysroot
,-> sda
'-> sdb
 `--md127
    |-md127p1 boot       /tmp/md1
    |-md127p2 EFI-SYSTEM
    |-md127p3 BIOS-BOOT
    `-md127p4 root       /tmp/md4
    sdc
    `-sdc1
    sdd
    `-sdd1

A this point everything looks right but FCOS still booting on 1 drive, e.g. /dev/sda and mdmonitor.service return me an error …

Can you help me to “hack” the end of the process please ?
I need help to tell (as my knowledge is not enough) initramfs ? dracut ? something else ? how to find and use md127p4 as root … it’s just here, already used by grub … So near and yet so far ! :smiley:

I know there’s a discussion on github about handling raid for /root

I’m not an expert at all but if I well understand thinks, you should handle some ignitions parts like raid in coreos-installer and not after some kind of “hardware” decision like partitioning by running ignition after hdd setup is done …

Thanks for reading :wink:

I think the problem is probably that your /etc/mdadm.conf isn’t in the initramfs. You might have to add some kernel args to make the raid array get assembled early on. AFAIK we haven’t paved this path just yet so (as you know) you’re operating a bit blind :slight_smile:.

Full support for root-on-RAID will be fixed by https://github.com/coreos/fedora-coreos-config/pull/503.

I think the final bit you’re missing for this hack is to add rd.md.uuid=... root=UUID=... on the kernel cmdline.

Right. It looks like he is doing a raid1 on the entire disk (not just for the root filesystem), though. I don’t know if we have any plans to address that, do we?

I think we should. Filed as https://github.com/coreos/fedora-coreos-tracker/issues/581

Hi

Just come back to my installation process, but in a different way …
I have a spare PC so i decide to install FCOS on it, via pxe, on one hdd, everything is ok for that…
But my question stay actual

On the PC with raid1 Hdd, I want to run a “diskless” pxe FCOS with this ignition file :

variant: fcos
version: 1.1.0

passwd:
  users:
    - name: core
      ssh_authorized_keys:
        - ssh-rsa blablabla

storage:
  raid:
  - name: MediaRaid
    level: raid1
    devices:
    - /dev/disk/by-id/ata-WDC_WD10SPZX-80Z10T2_WD-WXL1A49KPYFD
    - /dev/disk/by-id/ata-WDC_WD10SPZX-80Z10T2_WD-WX41A49H9FT4

  disks:
  - device: /dev/md/MediaRaid
    partitions:
    - number: 1
      should_exist: true
      label: RaidPart

  filesystems: 
    - path: /media
      device: /dev/disk/by-label/RaidPart
      format: xfs
      label: Media
      wipe_filesystem: false
      with_mount_unit: true

now my concern is about Raid1 on entire disk, it looks like FCOS try to create partitions on “/dev/md/MediaRaid” before creating “/dev/md/MediaRaid” raid device … and fail booting with error :

disks: createPartitions op(1): [failed] waiting for device [/dev/md/MediaRaid]: device dev-MediaRaid.device timeout