Recovering from BTRFS error in multi-device volume

fatka · June 23, 2024, 7:47pm

I was adding a new SSD to my system, and I decided to add a device to my current BTRFS volume instead of creating a separate file-system. But soon after (few minutes) I added the device, the file-system turned ro and I haven’t been able to remount it rw.

Hardware

original SSD: 1TB NVMe SSD plugged to M.2_2 (2nd slot)
- (for completeness) M.2_1 has an optane drive which serves as my swap
new SSD: 2TB NVMe SSD on a SSD carrier board on PCIe_2 (2nd PCIe slot)
- 1st PCIe slot has a graphics card

Existing setup

The disk is split into 3 partitions, fat32 for EFI, ext4 for /boot, and the rest is a BTRFS volume

Model: KINGSTON SFYRS1000G (nvme)
Disk /dev/nvme1n1: 1000GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags: 

Number  Start   End     Size    File system  Name                  Flags
 1      1049kB  630MB   629MB   fat32        EFI System Partition  boot, esp
 2      630MB   1704MB  1074MB  ext4
 3      1704MB  1000GB  999GB   btrfs

Steps I followed

AFAIR, these were my steps before the error:

I created a partition table and partitioned the new SSD using gparted (but didn’t format)
I mounted the root volume with mount -t btrfs /dev/nvme1n1p3 /mnt

from the terminal as root, I added the device to my current volume with btrfs device add

Model: WD_BLACK SN770 2TB (nvme)
Disk /dev/nvme0n1: 2000GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags: 

Number  Start   End     Size    File system  Name  Flags
 1      1049kB  2000GB  2000GB  btrfs

I only wanted the metadata replicated on both volumes, so I ran btrfs balance -mconvert=raid1 /mnt. I don’t recall exactly if this command failed immediately, or whether soon after I noticed the volume is now ro and I can’t write to it.

If I do btrfs filesystem show now, I see this:

Label: 'fedora_localhost-live'  uuid: e52fb859-29a0-484d-a4af-cfae8a61407e
	Total devices 2 FS bytes used 856.56GiB
	devid    1 size 929.93GiB used 929.92GiB path /dev/nvme1n1p3
	devid    2 size 1.82TiB used 0.00B path /dev/nvme0n1p1

And I see these errors from the kernel when I boot:

[ 2113.118428] ------------[ cut here ]------------
[ 2113.118430] BTRFS: Transaction aborted (error -28)
[ 2113.118438] WARNING: CPU: 8 PID: 3623 at fs/btrfs/transaction.c:2021 btrfs_commit_transaction+0xe45/0x1020
[ 2113.118446] Modules linked in: snd_seq_dummy snd_hrtimer rfcomm nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables qrtr bnep mei_hdcp mei_pxp iwlmvm snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic snd_hda_intel mac80211 snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec libarc4 snd_hda_core snd_hwdep joydev snd_seq intel_rapl_msr snd_seq_device intel_rapl_common btusb iwlwifi snd_pcm btrtl mei_gsc edac_mce_amd btintel mei_me snd_timer btbcm btmtk kvm_amd snd mei cfg80211 bluetooth soundcore kvm eeepc_wmi asus_wmi irqbypass ledtrig_audio i2c_piix4 k10temp sparse_keymap rapl platform_profile rfkill gpio_amdpt pcspkr wmi_bmof gpio_generic nfnetlink zram squashfs isofs hid_logitech_hidpp xe drm_gpuvm drm_exec gpu_sched drm_suballoc_helper drm_ttm_helper i915 crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel drm_buddy
[ 2113.118559]  sha512_ssse3 nvme uas video i2c_algo_bit usb_storage sha256_ssse3 r8169 nvme_core drm_display_helper sha1_ssse3 ccp realtek nvme_auth sp5100_tco cec ttm wmi hid_logitech_dj sunrpc be2iscsi bnx2i cnic uio cxgb4i cxgb4 tls cxgb3i cxgb3 mdio libcxgbi libcxgb qla4xxx iscsi_boot_sysfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi loop fuse
[ 2113.118611] CPU: 8 PID: 3623 Comm: btrfs-transacti Tainted: G        W          6.8.5-301.fc40.x86_64 #1
[ 2113.118615] Hardware name: ASUS System Product Name/TUF GAMING B550M-PLUS (WI-FI), BIOS 2806 10/27/2022
[ 2113.118617] RIP: 0010:btrfs_commit_transaction+0xe45/0x1020
[ 2113.118621] Code: e9 36 f6 ff ff be 01 00 00 00 89 04 24 e8 23 81 20 00 8b 04 24 e9 e9 f8 ff ff 8b 74 24 08 48 c7 c7 e0 1c b5 b9 e8 5b 25 ae ff <0f> 0b e9 03 f8 ff ff f0 83 44 24 fc 00 48 8b 90 60 03 00 00 65 ff
[ 2113.118624] RSP: 0018:ffffa59356ef3e28 EFLAGS: 00010282
[ 2113.118628] RAX: 0000000000000000 RBX: ffff88c713094dc8 RCX: 0000000000000027
[ 2113.118630] RDX: ffff88ce0de218c8 RSI: 0000000000000001 RDI: ffff88ce0de218c0
[ 2113.118632] RBP: ffff88c71353e600 R08: 0000000000000000 R09: ffffa59356ef3c98
[ 2113.118634] R10: ffffffffba516808 R11: 0000000000000003 R12: ffff88c713094d18
[ 2113.118637] R13: ffff88c710783000 R14: ffff88c713094e30 R15: 00000000ffffffe4
[ 2113.118639] FS:  0000000000000000(0000) GS:ffff88ce0de00000(0000) knlGS:0000000000000000
[ 2113.118642] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2113.118644] CR2: 00007fef1828d3e8 CR3: 0000000556422000 CR4: 0000000000350ef0
[ 2113.118647] Call Trace:
[ 2113.118649]  <TASK>
[ 2113.118651]  ? btrfs_commit_transaction+0xe45/0x1020
[ 2113.118655]  ? __warn+0x81/0x130
[ 2113.118661]  ? btrfs_commit_transaction+0xe45/0x1020
[ 2113.118665]  ? report_bug+0x16f/0x1a0
[ 2113.118671]  ? handle_bug+0x3c/0x80
[ 2113.118676]  ? exc_invalid_op+0x17/0x70
[ 2113.118679]  ? asm_exc_invalid_op+0x1a/0x20
[ 2113.118688]  ? btrfs_commit_transaction+0xe45/0x1020
[ 2113.118693]  ? btrfs_commit_transaction+0xe45/0x1020
[ 2113.118697]  ? srso_return_thunk+0x5/0x5f
[ 2113.118700]  ? start_transaction+0xc7/0x810
[ 2113.118704]  ? __pfx_autoremove_wake_function+0x10/0x10
[ 2113.118710]  transaction_kthread+0x159/0x1c0
[ 2113.118715]  ? __pfx_transaction_kthread+0x10/0x10
[ 2113.118718]  kthread+0xe8/0x120
[ 2113.118722]  ? __pfx_kthread+0x10/0x10
[ 2113.118726]  ret_from_fork+0x34/0x50
[ 2113.118730]  ? __pfx_kthread+0x10/0x10
[ 2113.118734]  ret_from_fork_asm+0x1b/0x30
[ 2113.118743]  </TASK>
[ 2113.118744] ---[ end trace 0000000000000000 ]---
[ 2113.118747] BTRFS info (device nvme1n1p3: state A): dumping space info:
[ 2113.118750] BTRFS info (device nvme1n1p3: state A): space_info DATA has 75691053056 free, is not full
[ 2113.118754] BTRFS info (device nvme1n1p3: state A): space_info total=989896638464, used=914205454336, pinned=0, reserved=0, may_use=0, readonly=131072 zone_unusable=0
[ 2113.118758] BTRFS info (device nvme1n1p3: state A): space_info METADATA has -537264128 free, is full
[ 2113.118761] BTRFS info (device nvme1n1p3: state A): space_info total=8598323200, used=5514952704, pinned=0, reserved=0, may_use=537264128, readonly=3083370496 zone_unusable=0
[ 2113.118765] BTRFS info (device nvme1n1p3: state A): space_info SYSTEM has 0 free, is not full
[ 2113.118768] BTRFS info (device nvme1n1p3: state A): space_info total=4194304, used=131072, pinned=0, reserved=0, may_use=0, readonly=4063232 zone_unusable=0
[ 2113.118772] BTRFS info (device nvme1n1p3: state A): global_block_rsv: size 536870912 reserved 536870912
[ 2113.118775] BTRFS info (device nvme1n1p3: state A): trans_block_rsv: size 0 reserved 0
[ 2113.118778] BTRFS info (device nvme1n1p3: state A): chunk_block_rsv: size 0 reserved 0
[ 2113.118780] BTRFS info (device nvme1n1p3: state A): delayed_block_rsv: size 0 reserved 0
[ 2113.118782] BTRFS info (device nvme1n1p3: state A): delayed_refs_rsv: size 0 reserved 0
[ 2113.118786] BTRFS: error (device nvme1n1p3: state A) in cleanup_transaction:2021: errno=-28 No space left
[ 2113.118822] BTRFS error (device nvme1n1p3: state EA): Error removing orphan entry, stopping orphan cleanup
[ 2113.118825] BTRFS error (device nvme1n1p3: state EA): could not do orphan cleanup -28
[ 2113.118856] BTRFS error (device nvme1n1p3: state EA): commit super ret -30
[ 2113.122929] BTRFS error (device nvme1n1p3: state EA): open_ctree failed

NOTE: I also noticed, when I first added the SSD to the BTRFS volume, the device name of the old drive was different. It changed from /dev/nvme0n1 → /dev/nvme1n1.

Attempts so far

Now when I do btrfs balance status, I see something about “paused” balancing. But I cannot cancel it because the volume always mounts ro. So I’ve tried to mount it rw by passing -o skip_balance to mount. Since the volume auto mounts on my machine (it has / and /home), so I tried this from live Fedora instance. However that fails.

I’m hoping to remove the new drive from the volume, and go back to how it was. I can then format the new drive as a separate BTRFS volume.

computersavvy · June 23, 2024, 11:01pm

This is normal and why it was decided to replace usage of the device name with UUID in most cases.

The first device configured gets the first name in sequence.
It would appear that when there was only one nvme device it was given the name nvme0n1 since it was the only one installed.
When you added the second nvme device it apparently became nvme0n1 since it was configured first and the older, original one, was renamed to nvme1n1.

This breaks anything that works with device names, but since the UUID does not change; had you used the UUID for your commands the system would not have gotten confused.

fatka · June 23, 2024, 11:13pm

I’m not sure there are UUID versions of the commands (I looked at the man page before running them). Either way, both btrfs device add and btrfs balance operates on a live volume, so it shouldn’t matter because the UUID can be determined. Since btrfs filesystem show now points to the correct block devices (see in the OP), I don’t think that is the problem.

My guess is, since my drive was getting full (~100 GB remaining), the balance step failed (see the kernel error). But that’s still strange since I only asked for the metadata to be replicated, can’t imagine that would be of the order of 100 gigs. I’m not sure how to debug this further.

augenauf · June 24, 2024, 8:11am

Removed bluetooth

augenauf · June 24, 2024, 8:39am

Let’s look at the kernel error messages:

fatka:

[ 2113.118758] BTRFS info (device nvme1n1p3: state A): space_info METADATA has -537264128 free, is full
[ 2113.118765] BTRFS info (device nvme1n1p3: state A): space_info SYSTEM has 0 free, is not full
[ 2113.118786] BTRFS: error (device nvme1n1p3: state A) in cleanup_transaction:2021: errno=-28 No space left

Looks like there is not enough space for METADATA. Showing negative free space means over-committed and full.
SYSTEM seems full too but apparently not critical.

Global Block Reservations: global_block_rsv shows that the drive has all of its reserved space allocated.

commit super ret -30 is a failure in committing the superblock, where -30 means the file system has gone read-only (probably to stop you from making things worse).

I personally don’t have any experience with btrfs and can’t tell you what the best way to fix this is. I would first make a backup of the drive, then copy the files off the RO drive to the new drive. Then format the drive and add it to the pool.

Also think about data loss, when striping volumes over multiples disks and not using btrfs’ RAID.

augenauf · June 24, 2024, 8:58am

ENOSPC - No available disk space | Forza's Ramblings has a some hints on how to recover from ENOSPC errors.

Balancing a Btrfs filesystem | Forza's Ramblings has some interesting information on btrfs balancing.

It also seem to explain what happened in your case:
“It is is good to have plenty of free space inside metadata chunks. The filesystem uses the metatdata space in all its normal operations, and when available metadata space runs out, Btrfs will try to allocate new metadata chunks. However, if there is no available Unallocated space when the Btrfs needs to allocate additional metadata chunks, the filesystem will turn read-only and will require manual intervention to recover.”

GBR was at a 100%, so no more space to allocate for METADATA.

Man, this filesystem is complex…

augenauf · June 24, 2024, 7:11pm

This proposed patch seems related and seems to address your issue.

https://lore.kernel.org/linux-btrfs/cover.1718665689.git.boris@bur.io/

Btrfs’s block_group allocator suffers from a well known problem, that
it is capable of eagerly allocating too much space to either data or
metadata (most often data, absent bugs) and then later be unable to
allocate more space for the other, when needed. When data starves
metadata, this can extra painfully result in read only filesystems that
need careful manual balancing to fix.

emanuc · June 24, 2024, 9:09pm

It doesn’t seem related. The OP’s problem is that they converted the metadata to RAID 1 from a single profile and there isn’t enough space on the metadata of both devices for RAID 1.

fatka:

I only wanted the metadata replicated on both volumes, so I ran btrfs balance -mconvert=raid1 /mnt. I don’t recall exactly if this command failed immediately, or whether soon after I noticed the volume is now ro and I can’t write to it.

If I do btrfs filesystem show now, I see this:
Label: 'fedora_localhost-live'  uuid: e52fb859-29a0-484d-a4af-cfae8a61407e
	Total devices 2 FS bytes used 856.56GiB
	devid    1 size 929.93GiB used 929.92GiB path /dev/nvme1n1p3
	devid    2 size 1.82TiB used 0.00B path /dev/nvme0n1p1

But I am a bit confused by this information.
To get more details on the filesystem space usage, they should provide the output of this command: sudo btrfs filesystem usage -T /mountpoint

emanuc · June 24, 2024, 9:17pm

To solve it, it’s simple: You need to add another partition or disk to complete the balancing, and then if you want, you can remove all the disks and return to single.

fatka · June 25, 2024, 1:44am

Firstly, thanks so much @augenauf for searching and finding the references, although not sure what I can do with that patch series. The description does seem related to my issue, but what the patch implements seems unrelated. For the time being, I’ll read the references. Either way, agree with your remark: “Man, this filesystem is complex…”

Oh, and thanks for helping me understand the kernel errors. I have a better understanding now, although still can’t spot what I did incorrectly

@emanuc to answer your questions:

I had about 100 gigs empty when all this happened, I don’t understand how metadata replication can take up close to 10% of the drive capacity.

Here’s the usage summary (after mounting with -o ro):

[root@localhost-live ~]# btrfs filesystem usage /mnt 
Overall:
    Device size:                  2.73TiB
    Device allocated:           929.92GiB
    Device unallocated:           1.82TiB
    Device missing:                 0.00B
    Device slack:                   0.00B
    Used:                       856.56GiB
    Free (estimated):             1.89TiB       (min: 1.89TiB)
    Free (statfs, df):            1.89TiB
    Data ratio:                      1.00
    Metadata ratio:                  1.00
    Global reserve:             512.00MiB       (used: 0.00B)
    Multiple profiles:                 no

Data,single: Size:921.91GiB, Used:851.42GiB (92.35%)
   /dev/nvme0n1p3       921.91GiB

Metadata,single: Size:8.01GiB, Used:5.14GiB (64.14%)
   /dev/nvme0n1p3         8.01GiB

System,single: Size:4.00MiB, Used:128.00KiB (3.12%)
   /dev/nvme0n1p3         4.00MiB

Unallocated:
   /dev/nvme0n1p3         1.00MiB
   /dev/nvme2n1p1         1.82TiB

I don’t quite get it, I added a brand new drive before I tried to balance, it’s still showing as unallocated. So how is adding a third drive going to resolve the issue? In any case, I don’t have another drive to add.

In case it helps, the balancing status shows up as:

[root@localhost-live ~]# btrfs balance status /mnt 
Balance on '/mnt' is paused
0 out of about 0 chunks balanced (0 considered), -nan% left

augenauf · June 25, 2024, 10:03am

@fatka, I agree with @emanuc , I think what you are experiencing is a symptom of the issue described in the patch. However, you are already in that unfortunate situation and need manual fixing. The patch would only be to avoid such situation in the future.

I am sorry I am not able to help fixing the issue. If it was my system I would have a few ideas that I would try but I have zero experience with btrfs (I am staying away from it) and I won’t advice on something I am not sure, especially since it’s sort of a critical thing.

gnwiii · June 25, 2024, 10:50am

You appear to have space for another partition on the new drive.

emanuc · June 25, 2024, 10:54am

To convert the profiles, you need to have enough space on both devices because it will rewrite the block groups (if this point is incorrect, someone will correct me).
From what I see on the /dev/nvme0n1p3 device, you do not have enough space:

fatka:

Metadata,single: Size:8.01GiB, Used:5.14GiB (64.14%)
   /dev/nvme0n1p3         8.01GiB

System,single: Size:4.00MiB, Used:128.00KiB (3.12%)
   /dev/nvme0n1p3         4.00MiB

Unallocated:
   /dev/nvme0n1p3         1.00MiB
   /dev/nvme2n1p1         1.82TiB

Make a backup of the data before performing any operations.
One solution is to add a third disk with the necessary space and resume the balancing. Then decide what to do, whether to return to a single disk, etc.
To add a third disk, you need the filesystem to be in “rw” mode. Resolve this with the skip_balance mount option.

skip_balance

(since: 3.3, default: off)

Skip automatic resume of an interrupted balance operation. The operation can later be resumed with btrfs balance resume, or the paused state can be removed with btrfs balance cancel. The default behaviour is to resume an interrupted balance immediately after a volume is mounted.

PS: Can you please list the various steps you took?

emanuc · June 25, 2024, 11:22am

It won’t solve the problem for users who decide to manually convert the profile without checking if there’s enough space.

If, as a Fedora user, I manually change some configurations and then encounter problems due to these incorrect configurations, I certainly can’t blame Fedora.
In this case, there’s no reason to be critical of Btrfs because the issue is solvable and the user performed a manual operation.

fatka · June 25, 2024, 11:43am

I already tried this, it fails (mentioned in OP).

[root@localhost-live ~]# mount -o skip_balance UUID=e52fb859-29a0-484d-a4af-cfae8a61407e /mnt/
mount: /mnt: fsconfig system call failed: No space left on device.
       dmesg(1) may have more information after failed mount system call.

And it generates kernel errors as included in the OP.

Do you need anything specific beyond what is mentioned in the OP? I included everything I did.

Again, I’ll repeat myself, I’m no btrfs engineer, but I did check for empty space in my naive way, running btrfs filesystem usage /mnt that showed I have close to 100 gigs (74 GiB to be more precise) of space remaining (which is still what is shown under Overall:), I started the balance after that.

I don’t know what gave you the impression anyone is blaming Fedora!? I’ve been a Fedora user for close to 20 yrs, and exclusively Fedora for slightly less. Can I not ask for help when something goes wrong, user error or not? At least that’s how it used to work in the old users mailing list, and also in this forum.

There is good reason to criticise BTRFS, because as I mentioned, I did due diligence (albeit not as a BTRFS expert) before attempting anything. You’ve said user error several times in this post, can you point out what was my error and where I can find info in the docs that would have helped me avoid it?

I’ve used BTRFS on and off for long enough, and it ends up biting me every few yrs, but I still use it because I want to experiment. I would think that gives me some right to also criticise it.

fatka · June 25, 2024, 11:48am

I think I understand now, balancing requires temporary workspace in each partition. In my case, since the metadata block in the original partition was half full (64% for the metadata block), metadata balancing failed. If only the balancing docs gave some guideline, I wouldn’t have attempted it.

emanuc · June 25, 2024, 12:10pm

It is documented: man btrfs balance
And here too: Balance — BTRFS documentation

Before changing profiles, make sure there is enough unallocated space on existing drives to create new metadata block groups (for filesystems over 50GiB, this is 1GB * (number_of_devices + 2)) .

jakfrost · June 25, 2024, 12:12pm

Did you add this as a raid1 array?

fatka · June 25, 2024, 12:18pm

I guess I missed the unallocated part. , thank you for highlighting it.

I simply added it with btrfs device add /dev/nvme... /mnt, not sure there is any difference for adding the device. As I understood it, the raid options are part of the balancing step. Since my drives aren’t equally sized, I only wanted to replicate the metadata, not data blocks. So I used the command you quoted.

emanuc · June 25, 2024, 1:08pm

I just noticed now, just for your information: You don’t need to mount the system partition if it’s already mounted. You can simply do this: sudo btrfs device add -f /dev/nvme1n1p3 /

I hope I’ve been helpful. I enjoy helping with what I know.

Topic		Replies	Views
Btrfs woes Ask Fedora f38 , btrfs , intel	16	2795	September 23, 2023
Btrfs corruption Ask Fedora btrfs , intel , kinoite	2	699	December 19, 2023
BTRFS error on boot Fedora 40 Ask Fedora btrfs , workstation	14	1419	June 13, 2024
How easy is it to add a device to the btrfs filesystem? Ask Fedora btrfs , workstation , adding-storage-devices	11	20060	February 13, 2024
Need help, BTRFS drive failing to mount Ask Fedora help , btrfs , filesystem	33	995	November 22, 2023

Recovering from BTRFS error in multi-device volume

Hardware

Existing setup

Steps I followed

Attempts so far

Related topics