/boot is mounted read-only by boot.mount

A few days ago, my /boot partition started showing up mounted read-only by the generated boot.mount systemd unit. The only hint I’ve gotten from journalctl is this:

mount[1155]: mount: /boot: WARNING: source write-protected, mounted read-only.

I have no idea why it thinks the source /dev/sda1 is write-protected. It isn’t, and I’ve run many variations of e2fsck on that ext4 partition, including -c. There’s no problem indicated. I can remount it read-write and it works just fine. But the next time I boot, /boot is ro again. In fact, my temporary work-around is this /etc/systemd/system/boot-to-rw.service file:

[Unit]
Description=Remount /boot as read-write.
Requires=boot.mount
After=boot.mount

[Service]
Type=oneshot
ExecStartPre=grep boot /etc/mtab
ExecStart=mount -v /boot -o remount,rw
ExecStartPost=grep boot /etc/mtab

[Install]
WantedBy=multi-user.target

But that’s a silly thing to do. The greps before and after the mount command show it’s ro before and rw after.
How can I get more information on why systemd-mount thinks the source is write-protected?

Just out of curiosity, how much free space do you have on that device?
“df” will tell you.
What is the ownership and permissions on /boot with the device not mounted?, and then with it mounted?

[root /]# df -h /boot
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1       2.0G  215M  1.7G  12% /boot

[root /]# ls -lZdi /boot
2 drwxr-xr-x. 7 root root system_u:object_r:boot_t:s0 4096 Aug 14 15:06 /boot

[root /]# umount /boot

[root /]# ls -lZdi /boot
3538945 drwxr-xr-x. 2 root root unconfined_u:object_r:root_t:s0 4096 Aug 13 21:55 /boot

The “unconfined_u” is the only thing I see on the permissions that even has a possibility of a problem. Don’t think so but might. Mine shows the same so not likely.

plenty of space.

I am stumped at this point, but maybe someone else has a clue.

Interesting…
I just unmounted /boot to check permissions like you did, and when I remounted it the permissions are now dr-xr-xr-x.
Using chmod I switched it back to drwxr-xr-x. but I have never seen that before.

I will reboot and see if there is a problem or just a fluke of how I did it this time.
EDIT
I just rebooted and /boot mounted properly.
When I was testing above I did a umount, then when I remounted it I used “mount -a” which should have mounted it properly.
Weird!

This is curious. If I cause dumpe2fs to run before boot.mount (by adding another service with a “Before=boot.mount”), then boot.mount will mount /boot read-write, and the invoked mount no longer warns “source write-protected, mounted read-only.”

You can try something like this:

  • Reformat the problematic partition and reinstall the related packages.
  • Fix the UUID, restore the bootloader and regenerate the initramfs.

I could do those things, but if they made the problem go away I’d never find out what was wrong.

BTW, I have resized the partition with gparted, going from 1GB to almost 2GB. That process never reported any sort of error. That was when I initially thought the problem was lack of free space, since dnf upgrade said it lacked N MB of space on /boot. There was plenty of space, it just wasn’t writable.

Perhaps the problem is unrelated and may persist.
That way should help you isolate the issue.
You can back up the entire partition beforehand.

Wow. Interesting, tedious, fiddly details ahead, but I now kind of know what’s happening.
First of all: nothing’s broken. There’s no disk or filesystem problem. It’s purely configuration choices interacting in timing-dependent ways.

On boot, systemd-fstab-generator creates various *.mount unit files corresponding to /etc/fstab entries and places them in /run/systemd/generator. However, you can also copy, say, /run/systemd/generator/boot.mount to /etc/systemd/system/boot.mount, and systemd will prioritize use of this static copy over the generated one. Then you can customize it by adding environment variables that mount uses to boost its logging. I added the “Environment=” line and the [Install] sections below to my /etc/systemd/system/boot.mount file:

...
[Mount]
Environment=LIBMOUNT_DEBUG=all
Where=/boot
What=/dev/sda1
Type=ext4
Options=defaults,rw

[Install]
WantedBy=multi-user.target

With no other changes, this reproduces the problem but adds significant debugging information from the mount process to the system journal. Crucially, it shows the actual error from the initial attempt to mount /dev/sda1 read-write. It isn’t that the device is write-protected as the later message claims, but rather that the device is busy. Something is already using that filesystem, and that something is LUKS. Or rather systemd-cryptsetup@.service, or maybe run-systemd-cryptsetup-keydev*.mount. That part isn’t exactly clear, and it’s so far down in the weeds I’m not sure I care. But here’s what’s going on.

Except for the /boot partition, all the others on this machine are encrypted. There are three LVM volumes in a LUKS encrypted partition on /dev/sda, and one more on /dev/sdb. There are several ways for LUKS to get a password, and if they fail, the universal fallback is to prompt for it at the console. But suppose that machine is not readily available via a console — like in a lights-out server room, attic, or basement across town. It’s possible to configure your /etc/crypttab so that it will attempt first to read the password from a specific file in /boot (the only un-encrypted partition available). You then create the file when you need to, then reboot remotely. The system boots into its initrd, finds the password, decrypts the other partition(s), completes booting, and then you login and delete that file. If you reboot and that file doesn’t exist, then the fallback of prompting at a console is back in play. (You can also configure a special ssh session with keys that allows you to remote in to the partially booted system and enter a LUKS password that way, but that’s out of scope for our purposes.)

The catch is, while system services are still getting the various LUKS partitions unlocked and checking the filesystems therein, the device/partition containing that password is in use, and you can’t go messing with that mount. Basically, it’s already been mounted by the time boot.mount is invoked. You could do a -o remount,rw, but a straight-up mount will fail. If you add a few silly services like I did up-thread, then you change the timing of things such that LUKS and its systemd friends are done (maybe), control of /dev/sda1 (maybe) is relinquished by the time boot.mount runs, and the mount (sometimes) works.

Maybe if you’re going so far down the rabbit hole as to have your encrypted system sometimes able to reboot with a temporary LUKS password in a temporary file in /boot, then adding a silly “Remount /boot as read-write” service isn’t so much farther down the rabbit hole after all.

1 Like

Finally got around to posting (has it been that long?) a bug report about this, only to discover that in the mean time systemd / crypt-setup has learned to do the Right Thing and unmount the file systems it mounts to retrieve key files. I can remove my hacky work-arounds and things still work.

Thanks, systemd folks. Keep up the good work.

2 Likes

It seems I’m doomed to revive this thread every year or so. The last time I tried to upgrade, from F37 to F38, this old problem bit me again. It manifests in the system upgrade claiming that /boot is short on space by about 50MB, when the actual problem is that /boot is mounted read-only.

My old work-around (top of this topic) is a systemd unit file that performs a remount. Unfortunately, the Install section listed it as WantedBy=multi-user.target. While that works for day-to-day use, that target isn’t in play during a system upgrade. I’m wondering now whether to add WantedBy=system-update.target. That target has AllowIsolate=yes, so I need to re-read the docs on that to better understand the ramifications. Alternatively I could add PartOf=sysinit.target which may cover both daily use and system upgrades.

Fedora’s twice-per-year version upgrade pace — which seems plenty fast for upgrades — is rather slow for experimental system administration tasks. I’d prefer to come up with the right solution rather than something that seems to work mostly. Suggestions are very welcome. Thanks.

Can you update info on this issue?

sudo smartctl -x /dev/sda
sudo umount /boot{/efi,}
sudo fsck.ext4 -f -c -D /dev/sda1
sudo mount -a
grep -e /boot /etc/{fstab,mtab} <(lsblk -O)

Ah! Bone-headed mistake on my part.

Thanks, @vgaetera , for prodding me to look further. It appears that in a prior effort to debug my boot-to-rw.service file, I had changed my /etc/fstab options for /boot to defaults,ro and had forgotten to remove the ,ro part when I was done.

I’ve chosen not to post the results you asked for because they would only present a distraction from the actual issue. We know what was initially setting /boot to ro (/etc/crypttab pointing to a LUKS key file on /boot causing systemd-cryptsetup and friends to mount /boot ro) and what was erroneously causing it this time (my forgetting to remove ro from the mount options for /boot in /etc/fstab).

Everything seems to be in place now for a successful F38→F39 system version upgrade even with the LUKS key file being read from /boot. And, should that present any “interesting” new information, I’ll add it here. (Although I expect this issue to be well and truly over at this point.)

[BTW, my /boot on /dev/sda1 is a vfat fs, so fsck.ext4 -f -c -D /dev/sda1 would not have been helpful.]

1 Like

Then this config looks wrong:

In any case, I’m glad you managed to identify the root cause of the issue.

:man_facepalming: You’re right; it is wrong. Since that was posted various filesystems have been resized and rearranged. /dev/sda1 is now /boot/efi:

What=/dev/disk/by-uuid/323A-42E7
Where=/boot/efi
Type=vfat

while /dev/sda2 is now /boot:

What=/dev/disk/by-label/booty2shoes
Where=/boot
Type=ext4

I’m using labels now, totally forgetting which was sda1 vs sda2 or that they have been swapped. (I should get a grown-up to run these systems. Sheesh.)

Sharp eyes you’ve got there! Thanks for looking.