Suddenly was not able to save files , fedora 34, Btrfs, /dev/sdc2

I did this procedure a few days ago. How add more space in a Btrfs Filesystem, How add a new partition, How add a new disk, Linux
I did the last step, the balance.

Hello pals problem in the disk, I was working in the pc then suddenly was not able to save files in the disk. When i wanted to save files the editor showed a message about the disk was on read mode.
I had to reboot the PC.
The pc when boot only show me Minimal BASH-like.

Now, I am at a boot usb with fedora 34.

part of the dmesg output.

les CT target to attach helpers instead.
[  162.717286] BTRFS info (device sdc2): disk space caching is enabled
[  162.717291] BTRFS info (device sdc2): has skinny extents
[  162.718685] BTRFS error (device sdc2): parent transid verify failed on 225366736896 wanted 293780 found 293201
[  162.718690] BTRFS warning (device sdc2): couldn't read tree root
[  162.718885] BTRFS error (device sdc2): open_ctree failed
[  199.477356] BTRFS info (device sdc2): disk space caching is enabled
[  199.477361] BTRFS info (device sdc2): has skinny extents
[  199.478675] BTRFS error (device sdc2): parent transid verify failed on 225366736896 wanted 293780 found 293201
[  199.478692] BTRFS warning (device sdc2): couldn't read tree root
[  199.479291] BTRFS error (device sdc2): open_ctree failed
[liveuser@localhost-live ~]$ 

BTRFS check

[liveuser@localhost-live ~]$ sudo btrfs check --clear-space-cache v1 /dev/sdc2
Opening filesystem to check...
parent transid verify failed on 225366736896 wanted 293780 found 293201
parent transid verify failed on 225366736896 wanted 293780 found 293201
Ignoring transid failure
ERROR: could not setup extent tree
ERROR: cannot open file system
[liveuser@localhost-live ~]$ 

I can mount sdc2 in /run/media/liveuser/f34-sdc2/ , shows message about wrong fs type

[liveuser@localhost-live ~]$ mkdir /run/media/liveuser/f34-sdc2
mkdir: cannot create directory ‘/run/media/liveuser/f34-sdc2’: Permission denied
[liveuser@localhost-live ~]$ sudo mkdir /run/media/liveuser/f34-sdc2
[liveuser@localhost-live ~]$ mount /dev/sdc2 /run/media/liveuser/f34-sdc2/
mount: /run/media/liveuser/f34-sdc2: must be superuser to use mount.
[liveuser@localhost-live ~]$ sudo mount /dev/sdc2 /run/media/liveuser/f34-sdc2/
mount: /run/media/liveuser/f34-sdc2: wrong fs type, bad option, bad superblock on /dev/sdc2, missing codepage or helper program, or other error.

then run the command btrfs balance -v -dusage=50 <mounted-partition>

[liveuser@localhost-live ~]$ btrfs balance -v -dusage=50 /run/media/liveuser/f34-sdc2/
btrfs balance: unknown token '-v'
usage: btrfs balance <command> [options] <path>
   or: btrfs balance <path>

    btrfs balance start [options] <path>
        Balance chunks across the devices
    btrfs balance pause <path>
        Pause running balance
    btrfs balance cancel <path>
        Cancel running or paused balance
    btrfs balance resume <path>
        Resume interrupted balance
    btrfs balance status [-v] <path>
        Show status of running or paused balance

balance data across devices, or change block groups using filters
[liveuser@localhost-live ~]$ 

command : btrfs filesistem show

[liveuser@localhost-live ~]$ sudo btrfs filesystem show
Label: 'f34_lv_btrfs'  uuid: ff8bfa82-86d7-4170-9e0a-858bf735ed77
	Total devices 2 FS bytes used 106.75GiB
	devid    1 size 135.03GiB used 72.03GiB path /dev/sdc2
	devid    2 size 100.00GiB used 37.00GiB path /dev/sdb5

[liveuser@localhost-live ~]$ 


How can we resolve the problem and get access to the system?.

1 Like

transid error is particular hard to fix.

Please only mount your btrfs tree in readonly mode until you got further instructions on how to fix this.

https://btrfs.wiki.kernel.org/index.php/FAQ#How_do_I_recover_from_a_.22parent_transid_verify_failed.22_error.3F

According to the above wiki, when the “wanted” and “found” is differ by less than 20, you can try to use mount option:
-o ro,usebackuproot

However, it do not say what to do next if that mount is successful.

I am sure @chrismurphy can give you more ideas.

1 Like

[ 162.718685] BTRFS error (device sdc2): parent transid verify failed on 225366736896 wanted 293780 found 293201

Not good. There’s over 500 commits difference between wanted and found. But we need a complete dmesg for the boot during which the problem started to happen. And that journal is likely buried in a file system you can’t mount, so it’s a difficult problem.

There is some chance the following will work from live media:

mount -o ro,rescue=all,degraded /dev/sdc2 /mnt

Immediately check dmesg to see if it worked, and if not why not.

If that doesn’t work, then I suggest filing a bug, bugzilla.redhat.com, choose component kernel and provide: kernel version at the time of the problem, description of how you first realized there was a problem, and realy as much info as you can provide (you can ignore/delete the template that appears). Once the URL appears in this ask thread, I’ll tag it for btrfs folks to look at in case there’s a bug. But typically this message is the result of one of the drives not honoring flush/fua and getting write order wrong. The thing is, you don’t have a crash or powerfail, and there’s two devices - do you know if you’re using btrfs raid1 for metadata? Seems to me it’s more likely a bug if you have raid1 metadata and this problem still happened; whereas if it’s hardware, the raid1 would protect you (most likely).

I also suggest making a btrfs-image, use a file sharing service to host it, and put the URL to the image in the bug report. The image contains only metadata, not file contents, and the -ss command below will hash filenames. Very short file names will cause errors (they are really complaints) that you can ignore.

btrfs-image -c 9 -t4 -ss -w /dev/sdc2 /tmp/bugxxxxxxx.img

If the rescue mount option doesn’t work, then the next step is to use btrfs restore to try and get any important files off the file system that aren’t backed up. It’s best to do this first because repair attempts, if they fail, can make restore more difficult.

Note that balance is not indicated for this problem, the file system did not mount anyway, but for sure when there are file system issues you want to change the file system as little as possible (or not at all) until you’ve refreshed backups because any changes to a file system after it’s determined to have a consistency problem can make recovery more difficult.

2 Likes

From live usb f34
trying to mount

[liveuser@localhost-live ~]$ sudo mount -o ro,rescue=all,degraded /dev/sdc2 /mnt
mount: /mnt: wrong fs type, bad option, bad superblock on /dev/sdc2, missing codepage or helper program, or other error.
[liveuser@localhost-live ~]$

Yeah that mount failure message is generic and doesn’t tell us why, you have look at the most recent lines in dmesg to know the details. I suspect it’s just the same parent transid error, and the rescue option can’t get past it. So it’s btrfs restore if you’ve got things on this file system you need to save. And btrfs-image for the bug report so we can see if it might be a btrfs bug or if it’s drive firmware flush/fua failure.

Also useful here and in the bug report, including

smartctl -i /dev/sdc
btrfs insp dump-t -t chunk /dev/sda5 | grep 'SYSTEM\|METADATA'

The flush/fua issue is closer to rare than common, otherwise lots of people would hit this. So we should see if it’s one of the known/suspected make/models of drive that has this issue. Also the second command will let us know if you have single, dup, or raid1 metadata.

If you need help with btrfs restore it’s a bit easier to do it on either #fedora:matrix.org or #fedora on libera.chat and ping cmurf; there’s quite a lot of back and forth and the particulars of each case matter.

1 Like

It is important to check disk problems, the problem was not about Btrfs the problems was the /dev/sdb disk drive, it is a disk that is in bad condition.

How to check disk drive health

Install smartmontools

#select your disk drive, sdx
[chris@fedora f34cloud]$ sudo smartctl -i /dev/sda |grep "User Capacity"
User Capacity:    480,103,981,056 bytes [480 GB]
[chris@fedora f34cloud]$ 
[chris@fedora f34cloud]$ sudo smartctl -t short -a /dev/sda | grep "SSD_Life_Left"
231 SSD_Life_Left           0x0000   004   004   000    Old_age   Offline      -       96
[chris@fedora f34cloud]$ 

In my case 004 is a bad result, and means problems in that SSD, and that answers some problems of my computers. A good result is maybe 097 .

more info here