Dracut emergency mode - need help diagnosing boot problem

I booted up today and was directly launched in to the emergency shell.

My Device is a Tuxedo Pulse 14, ryzen with onboard graphics,
Fedora 41 Kernel 6.15.4

I’m attachned the rdosreport, but I do not get a clue from that: rdosreport - Pastebin.com
Looking into journalctl the most likely cause is this error:

BTRFS error (device dm-0 state E): open_ctree failed: -5

Failed to mount sysroot.mount - sysroot

So I guess I somehow need to fix the partition then? Any suggestions on how to proceed?

Try tagging your post with btrfs to get @chrismurphy 's attention. He is the main Btrfs expert around here. :slightly_smiling_face:

1 Like

thank you, I hope @chrismurphy will pick up on it

  1. [ 16.154325] fedora kernel: BTRFS info (device dm-0): start tree-log replay

  2. [ 16.343179] fedora kernel: BTRFS: error (device dm-0) in btrfs_replay_log:2080: errno=-5 IO failure (Failed to recover log tree)

It seems that you’re experiencing the same issue as this user, problem with the log tree.

zero-log

clear the filesystem log tree

This command will clear the filesystem log tree. This may fix a specific set of problem when the filesystem mount fails due to the log replay. See below for sample stack traces that may show up in system log.

The common case where this happens was fixed a long time ago, so it is unlikely that you will see this particular problem, but the command is kept around.

Note

Clearing the log may lead to loss of changes that were made since the last transaction commit. This may be up to 30 seconds (default commit period) or less if the commit was implied by other filesystem activity.

One can determine whether zero-log is needed according to the kernel backtrace:

? replay_one_dir_item+0xb5/0xb5 [btrfs] ? walk_log_tree+0x9c/0x19d [btrfs] ? btrfs_read_fs_root_no_radix+0x169/0x1a1 [btrfs] ? btrfs_recover_log_trees+0x195/0x29c [btrfs] ? replay_one_dir_item+0xb5/0xb5 [btrfs] ? btree_read_extent_buffer_pages+0x76/0xbc [btrfs] ? open_ctree+0xff6/0x132c [btrfs]

If the errors are like above, then zero-log should be used to clear the log and the filesystem may be mounted normally again. The keywords to look for are ‘open_ctree’ which says that it’s during mount and function names that contain replay, recover or log_tree.

3 Likes

Thanks for the hint, do I need to give

btrfs rescue zero-log dm-0 ( which does not work)

On which device should I apply this? How can I determine this from the error message?

btrfs rescue zero-log /dev/disk/by-uuid/d5705694-cd96-4560-a6f3-54cfd2fdc3b3
This seems logical to how I understand the log, but I cannot apply it since the resource is busy:
ERROR cannot open device … resource busy
ERROR could not open ctree

Does it seem like the device is mounted? Are you running the command from a Fedora live system?
According to the logs, the device should be /dev/dm-0.

sudo btrfs rescue zero-log /dev/dm-0

So dm-0 is nothing the system can work with. I’m now trying to execute zero-log from a live system, maybe this will do the trick. Googling indicated that if zero-log errors out in dracut-emergency shell, the provlem might be so severe, that I can only fix it from another system.

Ok, writing this now from my laptop.

So @emanuc was completely right.

btrfs rescue zero-log “path to device” did the trick.

In my case this did not work from the dracut emergency shell. I created a fedora workstation live usb stick, booted into it, unlocked my luks device and executed

btrfs rescue zero-log /dev/disk/by-uuid/d5705694-cd96-4560-a6f3-54cfd2fdc3b3

Thanks!

2 Likes

Fedora 39 has been EOL for about a year, and AFAIK the 6.15.4 kernel was never released for that version. Are you actually using f39? or are you using f41 or f42 ? (both of which are still supported and both of which have the 6.15.4 kernels)

Sorry, misstyped, I corrected to Fedora 41

1 Like

Is there any measure one can take to prevent this type of „zero log error“? Is this a bug of brtfs? Or caused by user-misbehavior?

It could be a kernel bug or a hardware issue. Chris provided an explanation of the possible causes. It has been reported upstream, and one of the developers will surely look into it.

1 Like