System freezes; Suspect btrfs transid verify errors to be the cause

venefilyn · February 28, 2025, 5:31pm

Hi all,

~~I’ve recently tried to copy a file over to some application and my whole system froze.~~ (Edit: Unrelated and was due to an actual kernel bug, now fixed) Running a Bazzite/Silverblue machine so also using bootc with my own container. I’m typing this from the system and can access it normally, ~~but I have not figured out why the system froze.~~ Edit: What is causing these transid verify failed etc?

Figured to check Gnome logs and noticed there was some btrfs related logs after boot.

btrfs messages

17:42:25 kernel: BTRFS error (device dm-3): parent transid verify failed on logical 740589568 mirror 1 wanted 1183 found 2551
17:42:25 kernel: BTRFS error (device dm-3): parent transid verify failed on logical 740589568 mirror 1 wanted 1183 found 2551
17:42:25 kernel: BTRFS error (device dm-3): parent transid verify failed on logical 740589568 mirror 2 wanted 1183 found 2551
17:41:33 kernel: BTRFS error (device dm-3): unable to fixup (regular) error at logical 123462418432 on dev /dev/dm-3 physical 122380288000
17:38:51 kernel: BTRFS error (device dm-3): parent transid verify failed on logical 740589568 mirror 1 wanted 1183 found 2551
17:38:51 kernel: BTRFS error (device dm-3): parent transid verify failed on logical 740589568 mirror 2 wanted 1183 found 2551
17:37:59 kernel: BTRFS error (device dm-3): unable to fixup (regular) error at logical 123462418432 on dev /dev/dm-3 physical 122380288000
17:36:30 kernel: BTRFS error (device dm-3): fixed up error at logical 1202755403776 on dev /dev/dm-2 physical 13020102656
17:36:30 kernel: BTRFS error (device dm-3): fixed up error at logical 1202755403776 on dev /dev/dm-2 physical 13020102656
17:36:29 kernel: BTRFS error (device dm-3): fixed up error at logical 1198084718592 on dev /dev/dm-2 physical 12644384768
17:36:29 kernel: BTRFS error (device dm-3): fixed up error at logical 1198084718592 on dev /dev/dm-2 physical 12644384768
17:36:29 kernel: BTRFS error (device dm-3): fixed up error at logical 1198084718592 on dev /dev/dm-2 physical 12644384768
17:36:29 kernel: BTRFS error (device dm-3): fixed up error at logical 1198084718592 on dev /dev/dm-2 physical 12644384768
17:36:28 kernel: BTRFS error (device dm-3): fixed up error at logical 1166903738368 on dev /dev/dm-2 physical 11528175616
17:36:28 kernel: BTRFS error (device dm-3): fixed up error at logical 1166903738368 on dev /dev/dm-2 physical 11528175616
17:36:28 kernel: BTRFS error (device dm-3): fixed up error at logical 1166903738368 on dev /dev/dm-2 physical 11528175616
17:36:28 kernel: BTRFS error (device dm-3): fixed up error at logical 1166903738368 on dev /dev/dm-2 physical 11528175616
17:36:22 kernel: BTRFS error (device dm-3): fixed up error at logical 676329291776 on dev /dev/dm-2 physical 5211291648
17:36:22 kernel: BTRFS error (device dm-3): fixed up error at logical 676329291776 on dev /dev/dm-2 physical 5211291648
17:36:22 kernel: BTRFS error (device dm-3): fixed up error at logical 676329488384 on dev /dev/dm-2 physical 5211488256
17:36:22 kernel: BTRFS error (device dm-3): fixed up error at logical 676329488384 on dev /dev/dm-2 physical 5211488256
17:36:22 kernel: BTRFS error (device dm-3): fixed up error at logical 676329488384 on dev /dev/dm-2 physical 5211488256
17:36:22 kernel: BTRFS error (device dm-3): fixed up error at logical 676329488384 on dev /dev/dm-2 physical 5211488256
17:36:22 kernel: BTRFS error (device dm-3): fixed up error at logical 676329881600 on dev /dev/dm-2 physical 5211881472
17:36:22 kernel: BTRFS error (device dm-3): fixed up error at logical 676329881600 on dev /dev/dm-2 physical 5211881472
17:36:22 kernel: BTRFS error (device dm-3): fixed up error at logical 676329881600 on dev /dev/dm-2 physical 5211881472
17:36:22 kernel: BTRFS error (device dm-3): fixed up error at logical 676329881600 on dev /dev/dm-2 physical 5211881472
17:36:16 kernel: BTRFS error (device dm-3): fixed up error at logical 256814415872 on dev /dev/dm-2 physical 160759808
17:36:16 kernel: BTRFS error (device dm-3): fixed up error at logical 256814415872 on dev /dev/dm-2 physical 160759808
17:36:16 kernel: BTRFS error (device dm-3): fixed up error at logical 256784334848 on dev /dev/dm-2 physical 130678784
17:36:16 kernel: BTRFS error (device dm-3): fixed up error at logical 256784334848 on dev /dev/dm-2 physical 130678784
17:36:16 kernel: BTRFS error (device dm-3): fixed up error at logical 256784334848 on dev /dev/dm-2 physical 130678784
17:36:16 kernel: BTRFS error (device dm-3): fixed up error at logical 256784334848 on dev /dev/dm-2 physical 130678784
17:36:16 kernel: BTRFS error (device dm-3): fixed up error at logical 256784334848 on dev /dev/dm-2 physical 130678784
17:36:16 kernel: BTRFS error (device dm-3): fixed up error at logical 256784334848 on dev /dev/dm-2 physical 130678784
17:36:16 kernel: BTRFS error (device dm-3): fixed up error at logical 256784334848 on dev /dev/dm-2 physical 130678784
17:36:16 kernel: BTRFS error (device dm-3): fixed up error at logical 256784334848 on dev /dev/dm-2 physical 130678784
17:08:54 kernel: BTRFS error (device dm-3): level verify failed on logical 1250673917952 mirror 2 wanted 0 found 1
17:08:54 kernel: BTRFS error (device dm-3): parent transid verify failed on logical 1250673737728 mirror 2 wanted 133346 found 133001
17:08:54 kernel: BTRFS error (device dm-3): parent transid verify failed on logical 1250673704960 mirror 2 wanted 133346 found 133001
17:08:54 kernel: BTRFS error (device dm-3): parent transid verify failed on logical 1250673016832 mirror 2 wanted 133346 found 133001
17:08:54 kernel: BTRFS error (device dm-3): parent transid verify failed on logical 1250671280128 mirror 2 wanted 133346 found 133001

Here is latest output of dmesg

[ 1770.686229] BTRFS error (device dm-3): unable to fixup (regular) error at logical 123462418432 on dev /dev/dm-3 physical 122380288000
[ 1770.687652] BTRFS warning (device dm-3): checksum error at logical 123462418432 on dev /dev/dm-3, physical 122380288000, root 257, inode 18573628, offset 0, length 4096, links 1 (path: ostree/deploy/default/var/lib/containers/storage/overlay/d419170eda1698673bca47e8441bc9cbc736e453f8cdf94e58399f5f2919e8a2/diff/usr/lib64/libyaml-0.so.2.0.9)
[ 1822.235902] BTRFS error (device dm-3): parent transid verify failed on logical 740589568 mirror 2 wanted 1183 found 2551
[ 1822.237474] BTRFS error (device dm-3): parent transid verify failed on logical 740589568 mirror 1 wanted 1183 found 2551
[ 1822.240802] BTRFS info (device dm-3): scrub: not finished on devid 1 with status: -5

I have run btrfs scrub and read only btrfs check. Prepared a live ISO as well but since I don’t know exactly where to look after much looking around I’m asking here before I do anything stupid like running --repair without thinking.

chrismurphy · March 4, 2025, 8:21pm

It would be best to have the entire unfiltered dmesg. The fixup messages suggest one of the two copies of metadata was corrupted for some reason, and Btrfs is detecting this corruption and replacing the bad copy with a good copy.

The inability to fix could be that both copies of metadata are corrupt, or that the corruption hit a data block rather than a metadata block - data blocks have only one copy unless using forms of RAID 1 or higher.

The question is why is this happening? Is it related to the del_list corruption messages? Or is it some kind of slow death of the storage? I’m not sure based on the available information.

venefilyn · March 10, 2025, 1:31pm

Here is the dmesg for that day when the issue occurred. Ran journalctl -k -b -21 > ~/journalctl-feb27.log

Pastebin expires after 1 day, not sure if there is a better way to show the entire log?
https://paste.centos.org/view/d27e2882

chrismurphy · March 11, 2025, 2:22am


feb 27 13:40:40 bazzite kernel: BTRFS: device label fedora_fedora devid 5 transid 147150 /dev/dm-0 (253:0) scanned by (udev-worker) (1058)
feb 27 13:40:42 bazzite kernel: BTRFS: device label fedora_fedora devid 1 transid 147150 /dev/dm-1 (253:1) scanned by (udev-worker) (1239)
feb 27 13:40:43 bazzite kernel: BTRFS: device label fedora_fedora devid 2 transid 147150 /dev/dm-2 (253:2) scanned by (udev-worker) (1239)
feb 27 13:40:45 bazzite kernel: BTRFS: device label fedora_fedora devid 4 transid 147150 /dev/dm-3 (253:3) scanned by (udev-worker) (1058)

OK so a 4 device Btrfs. And also, devid 2 is dm-2 which we need to know later.

feb 27 13:40:45 bazzite kernel: BTRFS info (device dm-1): first mount of filesystem ac504488-ad0a-4cf8-a973-6acb4938ccb2
feb 27 13:40:45 bazzite kernel: BTRFS info (device dm-1): using crc32c (crc32c-intel) checksum algorithm
feb 27 13:40:45 bazzite kernel: BTRFS info (device dm-1): using free-space-tree
feb 27 13:40:45 bazzite kernel: BTRFS info (device dm-1): bdev /dev/dm-1 errs: wr 0, rd 0, flush 0, corrupt 637, gen 0
feb 27 13:40:45 bazzite kernel: BTRFS info (device dm-1): bdev /dev/dm-2 errs: wr 6828281, rd 642604, flush 610866, corrupt 0, gen 0

dm-1 has checksum mismatches (corruption), these values are just a rudimentary counter. Every time such a problem is encountered, the counter is incremented by 1. We don’t know if it’s 637 separate corruptions encountered 1 time each, or the same corruption encountered 637 times.

Same for the write, read, and flush errors on dm-2. That’s a lot of errors and suggests a device problem of some kind.

The dmesg doesn’t show enough info to figure out which physical device maps to dm-1 and dm-2, that should be in journalctl omit -k. But from the dmesg seems possible some of these are USB devices, and one is nvme. I don’t see errors in this dmesg, but they might be in previous dmesg (or journalctl -b X -k`) to give a clue what might be going on.

I’ve seen quite a lot of drives in USB enclosures not behaving well unless connected to an externally powered USB hub. The built-in hub on the host bus is often inadequate, although I’ve had some success in these cases disabling UAS for these USB devices. They run a bit slower as a result but they become more stable.

Also need sudo btrfs fi us $MNT to see the full breakdown of the block groups and profiles.

feb 28 01:18:32 sabre kernel: BTRFS error (device dm-1): parent transid verify failed on logical 466212061184 mirror 2 wanted 134644 found 134526
feb 28 01:18:32 sabre kernel: BTRFS info (device dm-1): read error corrected: ino 0 off 466212061184 (dev /dev/dm-2 sector 4543424)

mirror 2 is dm-2 (from the earlier snippet, at mount time). The transid faliure is on one device, and is being fixed up from some other source. I’m confused by the reference to inode 0. I’m pretty sure inode 0 is reserved, but I will check on that.

All similar messages like this are passive checksum checks, failures, and corrections from another mirror with a good copy.

My suggestion is do a scrub.

sudo btrfs scrub start $MNT runs in the background, use status to check it. Can be combined with watch. Or use -Bd to run foreground and print stats at the end of scrub.

OPTIONAL, you can zero the stats fields with btrfs device stats --reset $MNT before or after the scrub. The idea here is to make it easier to notice if the errors are still present and being detected. The messages say the problems are detected and fixed. Are they really? Why are they still present? Is this a transient problem?

Flash drives tend to fail by transiently returning zeros or garbage. So this sounds like early flash drive death. Eventually it’ll just die without further warning.

Some drives can fail by going read-only. They don’t report they’re going read-only. The kernel and file system still treats the drive as read-write, the writes go to the drive, the drive does not report any error, but the writes are NOT persistent. And the only way you find out, is you’re getting piles of read errors because what’s on the drive is not what Btrfs is expecting, which most often get reported as transid errors. The metadata being read is valid (passes checksum) but it’s the wrong transid (wrong transaction generation).

Anyway, this is going to require some isolation to find out what drive is giving you a hard time. And then replace it.

I recommend using btrfs replace for this. Not btrfs device add followed by btrfs device remove. The reason is replace creates a kind of virtual raid1/mirror between the old and replacement devices, and does a scrub between them - forcing replication from one to the other. It’s faster than add/remove. And it’s also more power fail/crash safe than add/remove.

NOTE that btrfs replace assumes the new block device is as big or bigger than the old block device. Also, replace does not automatically resize the file system on the replacement - depending on the block group profile, that may be a good thing, or you might want to do a resize on this one device to use all the available space. Btrfs is super tolerant of different sized devices, but it can get idiosyncratic as devices get close to full. So it’s a set of tradeoffs, per usual.

venefilyn · April 24, 2025, 11:01am

I struggle to read a lot of text so took me a while before I managed to read through this. Another system freeze and drive going read-only got me to remember this thread again.

You are completely right about one of the drives having hardware issues and I’m in the process of elimination to figure out which one. At the moment it points to my older NVME so I replaced that one with my functioning but slow mechanical HDD to debug.

Will respond with more info just for logging purposes

I do not have any USB drives, maybe it showed weirdly due to LUKS? To list the drives and what they are:

/dev/dm-0 - SSD  - Samsung SSD 850 EVO 500 GB
/dev/dm-1 - NVME - Crucial P3 Plus 4 TB
/dev/dm-2 - NVME - ADATA XPG SX6000 Pro 512 GB (SUSPECTED FAULTY)
/dev/dm-3 - SSD  - Samsung SSD 830 Series 128 GB

Full system (excluding mechanical HDD) on PCPartPicker

Here is the system before I replaced the drive, just to show the status

❯ sudo btrfs fi us /etc
Overall:
    Device size:		   4.67TiB
    Device allocated:		   1.77TiB
    Device unallocated:		   2.90TiB
    Device missing:		     0.00B
    Device slack:		     0.00B
    Used:			   1.67TiB
    Free (estimated):		   3.00TiB	(min: 1.55TiB)
    Free (statfs, df):		   3.00TiB
    Data ratio:			      1.00
    Metadata ratio:		      2.00
    Global reserve:		 512.00MiB	(used: 0.00B)
    Multiple profiles:		        no

Data,single: Size:1.74TiB, Used:1.64TiB (94.47%)
   /dev/dm-3	   1.74TiB

Metadata,RAID1: Size:17.00GiB, Used:15.40GiB (90.58%)
   /dev/dm-3	  16.00GiB
   /dev/dm-0	  14.00GiB
   /dev/dm-2	   3.00GiB
   /dev/dm-1	   1.00GiB

System,RAID1: Size:32.00MiB, Used:224.00KiB (0.68%)
   /dev/dm-3	  32.00MiB
   /dev/dm-0	  32.00MiB

Unallocated:
   /dev/dm-3	   1.88TiB
   /dev/dm-0	 462.89GiB
   /dev/dm-2	 462.75GiB
   /dev/dm-1	 118.23GiB

With scrub showing

~ via 🐍 v3.13.3 
❯ sudo btrfs scrub start -Bd /etc
Starting scrub on devid 1
Starting scrub on devid 2
Starting scrub on devid 4
Starting scrub on devid 5
ERROR: scrubbing /etc failed for device id 1: ret=-1, errno=5 (Input/output error)

Scrub device /dev/dm-3 (id 1) canceled
Scrub started:    Wed Apr 23 20:00:50 2025
Status:           aborted
Duration:         0:02:43
Total to scrub:   172.66GiB
Rate:             1.06GiB/s
Error summary:    csum=1
  Corrected:      0
  Uncorrectable:  1
  Unverified:     0

Scrub device /dev/dm-0 (id 2) done
Scrub started:    Wed Apr 23 20:00:50 2025
Status:           finished
Duration:         0:00:19
Total to scrub:   12.64GiB
Rate:             681.52MiB/s
Error summary:    verify=86460
  Corrected:      86460
  Uncorrectable:  0
  Unverified:     0

Scrub device /dev/dm-2 (id 4) done
Scrub started:    Wed Apr 23 20:00:50 2025
Status:           finished
Duration:         0:00:07
Total to scrub:   2.74GiB
Rate:             400.94MiB/s
Error summary:    no errors found

Scrub device /dev/dm-1 (id 5) done
Scrub started:    Wed Apr 23 20:00:50 2025
Status:           finished
Duration:         0:00:03
Total to scrub:   935.80MiB
Rate:             311.93MiB/s
Error summary:    no errors found

I ran btrfs device stats --reset /etc and subsequently ran replace with my mechanical HDD (Seagate Barracuda ST2000DM001-1CH164 2TB) btrfs replace start 2 /dev/sda /etc -f (Had to do force due to my HDD having a GPT - albeit unused)

I did all this yesterday. And today I have been met with no logs indicating errors, except for the one uncorrectable error csum for dm-1 – but I suppose that will always be there unfortunately?

Here is the scrub from today without the NVME and with my mechanical HDD (/dev/sda)

❯ sudo btrfs scrub start -Bd /etc
Starting scrub on devid 1
Starting scrub on devid 2
Starting scrub on devid 4
Starting scrub on devid 5
ERROR: scrubbing /etc failed for device id 1: ret=-1, errno=5 (Input/output error)

Scrub device /dev/dm-1 (id 1) canceled
Scrub started:    Thu Apr 24 12:36:14 2025
Status:           aborted
Duration:         0:10:46
Total to scrub:   172.64GiB
Rate:             273.66MiB/s
Error summary:    csum=1
  Corrected:      0
  Uncorrectable:  1
  Unverified:     0

Scrub device /dev/sda (id 2) done
Scrub started:    Thu Apr 24 12:36:14 2025
Status:           finished
Duration:         0:02:09
Total to scrub:   12.49GiB
Rate:             99.15MiB/s
Error summary:    no errors found

Scrub device /dev/dm-0 (id 4) done
Scrub started:    Thu Apr 24 12:36:14 2025
Status:           finished
Duration:         0:00:08
Total to scrub:   2.91GiB
Rate:             372.33MiB/s
Error summary:    no errors found

Scrub device /dev/dm-3 (id 5) done
Scrub started:    Thu Apr 24 12:36:14 2025
Status:           finished
Duration:         0:00:03
Total to scrub:   965.89MiB
Rate:             321.96MiB/s
Error summary:    no errors found

If it continues like this I will likely mark the drive 500GB NVME as dead.

Thanks a lot for the help @chrismurphy!!

chrismurphy · April 29, 2025, 2:40pm

You’d need to look in dmesg or journalctl -k --no-pager - all Btrfs messages are kernel messages. When scrub encounters a problem, the nature of it will be in dmesg. My guess, since it’s uncorrectable, is it’s related to a data block.

If there’s only one copy of data, then there’s no way to correct a bad copy. In this case the kernel code dumps a full path to the affected file. The simple way to fix it is to delete and replace the file with a known good copy (it could be a binary, in which case reinstalling or updating that package will fix it - if it’s user data then you’ll need to grab a backup copy).

If this is the only copy and it’s an important file and you’re willing to inspect it for the defect - you’ll need to do mount -o ro,rescue=idatacsums. It only works ro, so probably boot from a USB stick to use it. This is because Btrfs always checksums data, and whenever it runs into a csum mismatch this results in EIO to user space. It’s application specific how EIO is handled, but in any case, Btrfs doesn’t hand over corrupt data unless rescue=idatacums is enabled.

venefilyn · April 29, 2025, 7:02pm

Luckily seems to be unimportant file, and seems to be a known bug? So leaving it be for now might be the best course of action

fsck failure w/custom base images · Issue #1216 · bootc-dev/bootc · GitHub

apr 29 19:45:31 sabre kernel: BTRFS info (device dm-3): scrub: started on devid 4
apr 29 19:45:31 sabre kernel: BTRFS info (device dm-3): scrub: started on devid 2
apr 29 19:45:31 sabre kernel: BTRFS info (device dm-3): scrub: started on devid 1
apr 29 19:45:31 sabre kernel: BTRFS info (device dm-3): scrub: started on devid 5
apr 29 19:45:34 sabre kernel: BTRFS info (device dm-3): scrub: finished on devid 5 with status: 0
apr 29 19:45:37 sabre kernel: BTRFS info (device dm-3): scrub: finished on devid 4 with status: 0
apr 29 19:47:50 sabre kernel: BTRFS info (device dm-3): scrub: finished on devid 2 with status: 0
apr 29 19:50:08 sabre kernel: BTRFS error (device dm-3): unable to fixup (regular) error at logical 123462418432 on dev /dev/dm-3 physical 122380288000
apr 29 19:50:08 sabre kernel: BTRFS warning (device dm-3): checksum error at logical 123462418432 on dev /dev/dm-3, physical 122380288000, root 257, inode 18573628, offset 0, length 4096, links 1 (path: ostree/deploy/default/var/lib/containers/storage/overlay/d419170eda1698673bca47e8441bc9cbc736e453f8cdf94e58399f5f2919e8a2/diff/usr/lib64/libyaml-0.so.2.0.9)
apr 29 19:52:17 sabre kernel: BTRFS error (device dm-3): parent transid verify failed on logical 740589568 mirror 2 wanted 1183 found 2551
apr 29 19:52:17 sabre kernel: BTRFS error (device dm-3): parent transid verify failed on logical 740589568 mirror 1 wanted 1183 found 2551
apr 29 19:52:17 sabre kernel: BTRFS info (device dm-3): scrub: not finished on devid 1 with status: -5

Confirming the commits

❯ sudo ostree fsck
[sudo] password for spytec: 
Validating refs...
Validating refs in collections...
Enumerating commits...
Verifying content integrity of 673 commit objects...
fsck objects (61209/431628) [=            ]  14%
error: In commits e2d662543ab34f5ea4909979d5abf08b4c0cad7d1281685b5f283a8170a98cb4, f8dfeaab8a6a7719231837334663b6455c15d024e3386f87f489c99df0522976, f9f50527170221ca48c5b15667c0dd8dc1e6e4dbbdc9ca0af5bfcec7aa026290, f341bfaabe97d54959012dfc8b35ea405df7eca87389ade4e0dead590a39b89f, 04e18b0c65c16cd8e7385091a37ac6823476a92b219cd9efcc1f44032ab7bc2e: fsck content object 166b72c3199621ced2b6d39768aebb10ea1e70c47194d9111827a2e7c34133ec: Corrupted file object; checksum expected='166b72c3199621ced2b6d39768aebb10ea1e70c47194d9111827a2e7c34133ec' actual='ea6b004bc150b3b37d623348c7b77b3486d388f64c3f1138a021b8140d267d0e'

chrismurphy · April 29, 2025, 7:56pm

I’m not aware of any kernel or ostree bug that would cause either of the two messages reported. This

Is that a complete dmesg or is it filtered for only btrfs messages? I’m curious if there’s a block device error that coincides with this.

The scrub summary says there’s one error, and it’s a csum error. That’s shown as

apr 29 19:50:08 sabre kernel: BTRFS error (device dm-3): unable to fixup (regular) error at logical 123462418432 on dev /dev/dm-3 physical 122380288000
apr 29 19:50:08 sabre kernel: BTRFS warning (device dm-3): checksum error at logical 123462418432 on dev /dev/dm-3, physical 122380288000, root 257, inode 18573628, offset 0, length 4096, links 1 (path: ostree/deploy/default/var/lib/containers/storage/overlay/d419170eda1698673bca47e8441bc9cbc736e453f8cdf94e58399f5f2919e8a2/diff/usr/lib64/libyaml-0.so.2.0.9)

These lines are one error. The (btrfs internal addressing) logical addresses match, so it’s the same error, and it means there’s a checksum mismatch between stored csum and (re)computed checksum during the scrub of one 4KiB data extent.

We can trust that the stored csum is correct, and it is the data block that’s somehow bad since the original write. And that’s because the csum is stored in a csum tree leaf which itself is checksummed, but we’re not getting a separate checksum error for a metadata leaf.

However, I’m confused about the additional messages:

apr 29 19:52:17 sabre kernel: BTRFS error (device dm-3): parent transid verify failed on logical 740589568 mirror 2 wanted 1183 found 2551
apr 29 19:52:17 sabre kernel: BTRFS error (device dm-3): parent transid verify failed on logical 740589568 mirror 1 wanted 1183 found 2551

This tells us that there’s an unexpected generation found on a metadata leaf, both copies of metadata are affected, and the generation are far apart which reduces the chance it can be fixed.

I suggest limiting the writes to the file system until doing a btrfs check - which needs to be done offline (boot using any current Fedora USB install media, btrfs-progs 6.14 shipped with Fedora 42 - but you can pick an image suitable for reinstallation in case that becomes necessary).

In this case you’ll need to manually unlock using cryptsetup or Disks or equivalent before you can run btrfs check on the plaintext/dm-crypt device. This is a read only command, it will not repair the problem.

I also suggest while you’re booted from the USB device, that you mount the file system and refresh your backups just in case the best option is to reformat and reinstall. Btrfs can be made tolerant of certain errors, skipping over them, allowing for fairly normal data recovery, even in cases when the file systemt can’t be repaired. And repair is potentially dangerous so it’s best to take the chance now to preserve important data before attempting repair.

You can use mount -o ro $dev $mountpoint which will permit recovery of data, without writing more to the file system, and also preserves checksum verification of data to avoid replication of corrupt data into backups. Keep in mind all errors go to dmesg, so you’ll want to check dmesg after freshening backups, because in case of any checksum error the backup is strictly speaking incomplete - btrfs will not replicate corrupt data, and many user space programs will stop upon EIO rather than make a best effort to read all remaining data after EIO.

Anyway, you can post the btrfs check output here first. And see if maybe it’s worth trying --repair (only after backing up!). Or you can skip that and just reprovision (including reformatting so you start with a new Btrfs file system) when it’s convenient for you.

Also note, the parent transid verification failure happening during a scrub, but not being reported in older journalctl logs (you can filter for the error going back weeks or months and see if it’s ever been encountered during run time) might tell us that this is a seldom used metadata leaf. In that case maybe the problem is isolated, but I can’t really assess if the problem might get worse over time.

Topic		Replies	Views
Fedora 42 btrfs disks got stuck and reboot too Ask Fedora	15	238	May 15, 2025
[advice/guidance needed] btrfs errors/warnings on 3 year old system Ask Fedora ssd , btrfs , silverblue , f40	11	173	September 23, 2024
BTRFS error on Fedora 39 beta silverblue when installing some game on steam Ask Fedora wayland , luks2 , beta , gnome , btrfs , intel , f39 , silverblue	2	384	September 26, 2023
Stuck in emergency mode Ask Fedora btrfs , f42	2	62	June 2, 2025
Problem with boot, kernel and Btrfs Ask Fedora boot , btrfs , nvidia , kernel , f39	40	1115	August 13, 2024

System freezes; Suspect btrfs transid verify errors to be the cause

Related topics