Fedora Server: move to BTRFS RAID-1 on a Raspberry Pi SBC

Hi there! I have successfully installed Fedora Server on my SBC (Raspberry Pi 4 8GB) and the default configuration of it’s filesystem didn’t fit my needs, so I decided to make some changes.

The partition used for storing / and /home isn’t separate and it only creates a small 5GB partition using the xfs filesystem, leaving the rest of the SD card space unused.

After discovering this, I tried to move the installation to a RAID-1 BTRFS filesystem shared between the microSD card and an external USB pendrive by doing this:

  1. Create new primary partitions on drives through cfdisk;
  2. Make filesystem: mkfs.btrfs -f -m raid1 -d raid1 /dev/sda /dev/mmcblk0p4 (/sda is a USB drive and /mmcblk0p4 is the partition on the SD card);
  3. Mount it: mkdir /mnt/UUID & mount /dev/sda /mnt/UUID;
  4. Created new BTRFS subvolumes: btrfs subvolume create /mnt/UUID/root, same for /home;
  5. Migrate LVM data (fedora-root): mkdir /mnt/lvm & mount /dev/mapper/fedora-root /mnt/lvm;
  6. Copy data to root subvolume: rsync -aAXv --exclude={"/home/*"} /mnt/lvm /mnt/UUID/root/
  7. Copy data from /home/ to it’s subvolume: rsync -aAXv /mnt/lvm/home/ /mnt/UUID/home/;
  8. Update /etc/fstab - kept the /boot and /boot/efi entries and added these:
UUID=btrfs-uuid /         btrfs  subvol=root  0 0
UUID=btrfs-uuid /home     btrfs  subvol=home  0 0
  1. Unmounted LVM partition and rebooted the system;
  2. It booted back up normally and successfully mounted the new filesystem, I then proceeded to try to delete the old lvm fedora_root: lvremove /dev/mapper/fedora-root, this however didn’t succeed telling me that it was still in use;
  3. Frustrated, I decided to use fdisk to delete te partition that contained LVM (mmcblk0p3) and killed my install :frowning:.

What am I doing wrong here? How do I make sure the LVM partition is not used by the system anymore? Do I have to sudo systemctl daemon-reload after editing /etc/fstab? Is this even a reasonable thing to do?

All I want is to have a basic RAID filesystem between my Pi’s internal microSD card and an external USB flash drive of the same capacity so there is some redundancy to the data stored there (also planning on making an offsite backup of this filesystem if this ever works out).

Thank you for the help in advance!!

OK, I have a clue: I forgot the part of editing the grub configuration, updating GRUB, rebuilding initramfs with dracut and everything else, I think it will now work!

However, I just destroyed another Fedora installation for not knowing if U-Boot is considered CSM or UEFI and ran the wrong command for updating GRUB, I believe lol

When writing the image to the sd card using the arm-image-installer there is an option to extend the file system to the full size of the card during installation. Did you try that? I use a 64G card and have the entire space available.

If not it can still be extended on the card using gparted.

1 Like

I have just extracted the raw installation file for Fedora Server aarch64 without using that tool or making any changes / specifying flags. That looks like something I might want to use, but nonetheless it doesn’t seem to help me out not using xfs and instead replacing it with a btrfs alternative with subvolumes for / and /home just like it’s done over on the Fedora Workstation side by default with that USB in RAID-1.

However, I do think that I was in the right path for getting it to work the way I described on the post but ran sudo grub2-mkconfig -o /boot/grub2/grub.cfg instead of sudo grub2-mkconfig -o /boot/efi/EFI/fedora/grub.cfg (which is the preferred method on UEFI systems) thus nuking my install (can no longer boot).

I don’t think that the xfs file system can be auto extended with the image installer. Instead you could still use gparted for that purpose.

Hmm, I get it. Could I set up the BTRFS filesystem the way I want to using gparted that way too?

yes of course

That was the preferred on systems before about fedora 33 and has been deadly to update with kernels ever since. The only ‘real’ grub.cfg file is now the one at /boot/grub2/grub.cfg. The other is only a pointer to redirect uefi grub to that one.

In your case the SBC does not use grub on the RPi.

So I should make the necessary changes there and then run sudo grub2-mkconfig -o /boot/grub2/grub.cfg?

grub is irrelevant on the RPI

I get it, how could I change the filesystem from which the kernel is loaded in this scenario?

It’s intentional. At the time of the partitioning design decision, we weren’t sure what the use cases would be for storage so the decision was to leave much of it unallocated. You can create a new LVM logical volume, or grow any existing logical volumes using the rest of the space. At the time, another option was to use the space for device-mapper thin volumes for use with containers. Today, most everyone is using overlayfs.

Yeah migrations are not straightforward at all because the root being mounted is not in /etc/fstab, it’s on the kernel command line. And the bootloader configuration in Fedora follows a mix of GRUB and BootLoaderSpec, you will not find this well documented anywhere, that BLS snippets are in /boot/loader/entries, or anything about /etc/kernel/cmdline - and the change can render the host-only initramfs invalid, also breaking boot. I’m really good at doing migrations and yet there are enough steps I pretty much always manage to forget a step and break boot.

This is why I end up recommending backup, clean install, restore. It ends up taking less net time because you avoid all the ensuing troubleshooting of edge cases.

Also, I can’t recommend Btrfs raid-1 if your use case requires unattended degraded boot.

Right now, there is a udev rule in place that results in systemd waiting indefinitely for all Btrfs devices to appear before it will attempt to mount the file system. I’m not sure about the history of this rule. But let’s say you remove it. Now, if any device appears even slightly later than the other, for whatever reason, mount will fail. Btrfs doesn’t do automatic degraded mount, you have to use a mount option. And it isn’t intended to be set by default on the kernel command line as a boot option - because if you do, you can end up in a “split brain” scenario where the drives end up being separately mounted degraded due to innocuous delays appearing to the kernel. There is no automatic scrub following a missing drive reappearing. Therefore the mirrors can get out of sync, and scrub is the only way to fix it. And scrub can’t fix the case where A and B drives were separately mounted and written to in degraded mode - the fs becomes corrupt and unrepairable - though probably read-only mountable with the most recent rescue mount options.

The trouble with software raid on linux is you need to be familiar with the degraded behaviors, which are unique to each implementation: LVM, mdadm, and Btrfs have different pros and cons.

1 Like

Ahh. Yeah I’m no help when it comes to u-boot, I have no idea any of its config files are located.

Thank you for the insights! I now get it why the use of BTRFS in this configuration isn’t really recommended.

This is why I end up recommending backup, clean install, restore.

What do you mean by the “restore” part of this process?

I’m starting out from an arm-image-installer (FedoraServer.raw.xz) SD Card with the default Fedora LVM install (3 partitions: /boot, /boot/efi and the LVM one). In this scenario, what would be the best approach to expand it to occupy the entire SD Card and also mirror it to a USB drive adding a RAID -1 like redundancy?

Restore from backup.

Ahh. That does require some really specialized knowledge of mdadm and GRUB to build it up from a raw image.

There is a Network Installer option: Fedora Server | The Fedora Project which will boot a graphical installer. If you don’t have a way to connect video to the Pi, you can edit the boot entry (for the network installer image) adding inst.vnc and then connect with a VNC client.

In this case, you’re learning the idiosyncracies of the installer. The mdadm managed RAID is a device type in the installer’s vernacular. The device types are standard (plain partitions), LVM, Btrfs, and RAID. I think in your case putting XFS directly on mdadm raid is OK.

I’m trying to think of how else I would do this, but I’d really have to iterate and fiddle with it.

I would prefer Btrfs on mdadm RAID for the use case in which I need unattended degraded boot capability. I wouldn’t get self-healing in case of data corruption, since Btrfs raid isn’t used. But there’s duplicate metadata with Btrfs by default, so metadata self-heal is still possible. And btrfs still won’t permit corrupt data from making it to user space, and scrub can be used to check all data.

Btrfs has supported reflinks since forever, for efficient copy-up if your use case includes containers running on overlayfs.

1 Like

OK so as it turns out, the installer doesn’t completely configure the bootloader correctly for RAID.

Neither the installer nor the bootloader are actually smart enough to install the bootloader onto both drives correctly, or maintain that setup as the system is updated and upgraded.

In the case of GRUB on UEFI, there is only one ESP. So if the drive with the ESP dies, no boot.

At one time there was a way to get mdadm to setup mdadm raid for the ESP in such a way that that firmware didn’t know it was mdadm - it just saw two separate ESPs. The problem there is upstream md developers don’t like this. The resulting partitions are strictly speaking not ESPs but mdadm members. The firmware could choose to write to either of them, since this is allowed by the UEFI spec. And in this case it’s ambiguous which one is correct, and can lead to the corruption of the ESP on RAID setup.

And then there’s the view this can only work with firmware raid or hardware raid, neither of which really exist outside of x86.

Enter coreos bootupd.

This may one day help deal with separate drives use case, and making sure all the copies stay sync’d with some reference. This would also be useful for repairing such a bootloader partition in case it became corrupt or goes missing or the drive needs replacement.

So… in reality my prior statement about not recommending btrfs for unattended degraded boot is mooted by the fact we don’t have the bootloader mirrored either.

Ironically, GRUB on x86 for BIOS systems does deal with this use case correctly and so does the Fedora installer. The installer has grub2-install run on all the member drives, following intsallation. So all drives have not only a copy of the bootloader, but that embedded bootloader on each drive contains the drivers needed to find the grub.cfg wherever it’s located. But we’ve lost this simplicity with more complicated booting schemes, including UEFI.

1 Like

Thank you!! I’ll be trying to configure my drives through the Network Install option instead of attempting to move the system live as I had been doing previously.

If I can use BTRFS I’ll stick to it given the flawless experience I had with it on my desktop computer with regards to subvolumes and snapshots.

The bootloader not being copied to the USB pendrive isn’t really that big of a deal for me since this is not a super high availability project but instead a very basic home lab server. I fear the death of these cheap consumer grade SD cards (industrial ones are way more reliable) making me lose data, that’s why I’n looking to set up some kind of redundancy. If the SD card dies it will surely require manual intervention for getting a new one flashed and the RAID1 file system rebuilt, but at least my data would me safe (in theory).

Also, the coreos suggestion was great, unfortunately it doesn’t yet work on SBCs :frowning: