BTRFS error on boot Fedora 40

Hi, I’m looking for advice on my Fedora 40 workstation which does not boot from its BTRFS formatted ssd.

I am stuck into a rescue mode root shell where I can read /run/initramfs/rdsosreport.txt but that is hard to read. Some lines :

fedora kernel: BTRFS error [device nvme0n1p3) in _btrfs_free_extent:3066: errorno=2 No such entry
and
fedora kernel: BTRFS error [device nvme0n1p3): open_ctree failed
and
fedora systemd[1]: sysroot.mount: Mount process exited, code =exited, status=32/n/a

I have done a btrfs check -p /dev/nvme0n1p3 which tells me twice about an
incorrect local backref count; backpointer mismatches and errors found in extent allocation tree or chunk allocation.

I’m not sure what to do. What are my options here?

Was there some event that might have caused a corrupt file-system such as power failure, disk full, or unsafe restart?

You should be able to save the full rdsosreport.txt to a USB key. If the drive has failed you would want to consider data rescue rather than attempting to repair. Boot a Fedora from another drive (Live USB key) and use Gnome Disks to check the drive health. You will also likely need a way to boot Fedora to collect more details and attempt rescue or repair.

If the drive has important data you should consider cloning it to another drive so you get a second chance if repair attempts make things worse.

Thanks George. I’m not aware of an event that may have caused the problem. There have been some hickups from time to time but can’ t recall what these have been, really.
Will try to take the M.2 disk out to put it in an enclosure with usb-c connection so I can try and clone it on another Fedora desktop. Complicated stuff though.

Same problem here. As my problem is very close to this I guess it may be worth not to open a new thead for slight differences. In my case I am able to boot with kernel 6.8.5 but not with 6.8.9 nor 6.8.10. I did not test fedora 40 with kernel 6.8.8, but, if I well remember, before upgrading to fedora 40, fedora 39 ran correctly with kernel 6.8.8, but didn´t boot after upgrading to fedora 40. As I decided to try to upgrade via Discover, I addressed to an immature upgrade procedure the problem. So I reinstalled from scratch Fedora 40. My installation is on an external 4GB nvne disk and, after the installation, I reduced the size of the btrfs partition to onlyhalf the disk capacity, to create an NTFS partition to swap data with windows. I checked all my configuration files and I have only this instruction recalling “resume”: GRUB_DISABLE_RECOVERY=“true”
in /etc/default/grub.
As at first I installed fedora without a swap partition, I then created and activated one, but the problem persists.

If you can boot an older kernel you do not have the same problem (corrupt file-system). There have been many topics where a system boots older kernels but fails to boot a new kernel, so you may find there is already a solution. If you don’t find a solution, please start a new topic. You will need to provide details of your hardware. Please include the output from inxi -Fzxx (as pre-formatted text, using the </> button from the top line of the text entry window).

Hi George, I am going to follow your suggestion, and in the new thread I will post a screenshot of the error I get, but with kernels .9 and .10, I too get a corruption error on btrfs partition. That’s why I I said I have the same error, even if not extended to all the releases o kernel 6.8.x.

So, it took some time but now I have been able to boot the machine from an usb disk, Fedora 40 again. In Gnome disks it seems the Samsung SSD has 3 partitions, one FAT and one Ext4, both are ok but the biggest is btrfs and corrupt. Somehow I can’t copy the output of the filesystem-check of Disks. So I checked it with btrfsck and that gave this output I could copy:
[1/7] checking root items
[2/7] checking extents
data extent[28667670528, 167936] referencer count mismatch (root 256 owner 11917756 offset 0) wanted 0 have 1
data extent[28667670528, 167936] bytenr mimsmatch, extent item bytenr 28667670528 file item bytenr 0
data extent[28667670528, 167936] referencer count mismatch (root 256 owner 11917756 offset 1048576) wanted 1 have 0
backpointer mismatch on [28667670528 167936]
ref mismatch on [139609505792 16384] extent item 0, found 1
tree extent[139609505792, 16384] root 2 has no backref item in extent tree
backpointer mismatch on [139609505792 16384]
ERROR: errors found in extent allocation tree or chunk allocation
[3/7] checking free space cache
there is no free space entry for 139609505792-139609538560
cache appears valid but isn’t 139608457216
[4/7] checking fs roots
[5/7] checking only csums items (without verifying data)
[6/7] checking root refs
[7/7] checking quota groups skipped (not enabled on this FS)
found 88898514944 bytes used, error(s) found
total csum bytes: 65471940
total tree bytes: 1646575616
total fs tree bytes: 1463369728
total extent tree bytes: 90701824
btree space waste bytes: 410204712
file data blocks allocated: 159392272384
referenced 106544926720

I have found an empty hd of the same size as the ssd so I can copy the faulty partition or the whole ssd before trying to fix it. However, I can not see a copy or clone mechanism in Disks. Now I wonder what way I could make a perfect copy of the faulty btrfs partition?

Btw. After noticing an error in rdsosreport.txt:
" 0.129863] fedora kernel: DMAR: [Firmware Bug]: Your BIOS is broken; bad RMRR [0x000000007b800000-0x000000007fffffff]"
I did update the bios but that made little difference in this matter.

Added btrfs

Try if you can mount the partition in “rescue mode”: sudo mount -o ro,rescue=all /dev/nvme0n1p3 /mnt

1 Like

Gnome DIsks can make an image file which can then be “restored” on the original drive (useful when the first repair attempt fails) or written to a new drive. You can (loop) mount partitions from the image with Gnome Disks, but that isn’t much use with corrupt filesystems.

For 1-off clones, I’ve always used dd on the command-line.

There are some technical considerations that need to be considered when doing a direct clone of a disk. Arch Linux often has excellent documentation, and the majority of it applies to other current distros. https://wiki.archlinux.org/title/Disk_cloning has technical details you may want to review, the see the section “Cloning an entire hard disk”.

1 Like

I did so; without a problem. The mount is succesful. Fantastic! Thanks maestro! Saved my data by copying it. Also I will try a dd clone.
Now I wonder if this type of mount should be used at startup somehow. Or should a possible dangerous btrfs repair attempt be done. Or a scrub.

1 Like

Glad your data are safe. The next step is to check the drive health as trying to repair file-systems on failing hardware is pointless. Gnome disks can run extended “drive health” tests, but drive vendors may have better tools (they know more about common failure modes).

Btrfs diagnostics and recovery have been described in a number of topics here that are worth reading to see what might be involved. The BTRFS Documentation has (work in progress) troubleshooting pages.

If you’re still having this problem, join https://matrix.to/#/#fedora:fedoraproject.org and poke me @cmurf.

I’m running your btrfs check --readonly up the flag pole to see if it’s safe to attempt repair.

A scrub will not be able to fix anything because rescue=all requires read-only mount, and I suspect scrub kernel messages will give us less useful information than the btrfs check.

I used Rescuezilla to clone the faulty disk. After that I did :
box:~$ sudo btrfs rescue chunk-recover /dev/sdd3
Well… that command took a really long time to complete and to conclude:
“Check chunks successfully with no orphans”
But I still can’t mount the partition. So at last I issued the dreaded “btrfs check --repair” command.
And it did what I wished for. Me happy. And I repaired the original disk the same way.
Thanks for the guidance and advice!

1 Like

Samsung does have a tool for ssd checks called Samsung Magician Software but not for gnu/linux.