My suggestion is to focus on diagnosis, not repair. This might be harder than fixing it, mainly because it takes more patience. But it’s also safe. Repairs aren’t always safe.
Corrupt systemd journals are semi-common, and journalctl skips over the corrupt entries (the whole journal file isn’t corrupt, just one or two entries - resulting in log rotation to avoid further problems with that file).
We really need to see kernel messages related to mounting the file system. All file system mount related problems are kernel messages, so yeah we need dmesg
.
How to get it?
You could add rd.break=pre-mount
as a kernel parameter, which will get you a prompt while still in the initramfs environment. It’s super limited and directories aren’t quite what you’d expect. Once at the prompt you can do: blkid
and identify the Btrfs partition. And simply try to mount it, mount /dev/sdXY /sysroot
(the mount point /sysroot is unique to the initramfs environment, other more familiar locations don’t yet exist). If the problem is with Btrfs, this mount will fail, and now you can get dmesg
and take a cell phone photo and post it here.
Still another option is USB install media (any) and doing something similar. Boot, head to a terminal program, identify the partitions, mount the btrfs partition (or try to) and then dmesg
and see what the kernel says the issue is.
OK re(read) the thread and see it was a one time read-only. So in that case, you can just reboot and it should ordinarily recover on its own. But you’re right to be suspicious.
I would use journalctl
to look at previous kernel messages that have been recorded. However, because the file system went read only, the log might not have these messages because they couldn’t be written. We’d need dmesg
at the time the error occurred in this case. My guess is the file system became confused and went read only to avoid the confusion being written to disk, which is the correct behavior.
Btrfs has a read time and write time tree checker to verify correctness of certain very important metadata prior to writing them to disk and if something is wrong (including even bit flips) the tree checker will trigger and flip the file system read only to prevent the confusion from being written to disk. It could be something like that. But what’s next depends on the actual error rather than speculation.
It’s safe to do drive tests, but also a good idea to do memory tests.
There’s some tradeoffs between pre-boot environment testers like memtest86 (and descendants) which test more memory but can be difficult to use. And linux user space testers like memtester (in Fedora repo) which are easier to use but test less memory because linux has to be booted first and any memory being used prior to initiating the test is not tested. You can limit this side effect somewhat by booting with kernel parameter 3
(same as systemd.unit=multi-user.target
) which will boot to a prompt and leave out the entire graphical target stack, saving quite a lot of memory for testing.