[Article Proposal] Migrate metadata from Single to DUP profile on btrfs partitions create with btrfs-progs pre-5.15

mateusrodcosta · December 7, 2023, 5:57pm

Article Summary:

So, basically it’s expected that many users with very old btrfs partitions (more specifically, created with a btrfs-progs version before 5.15) are running with metadata on the Single profile (that is, only one copy), whereas starting in 5.15 the default is for the DUP profile (that is, two copies).

The profile change process is relatively without risks as long as the user has enough free space (I commented with one user on Telegram about doing it and he bricked his system while attempting it, so far we have mostly chalked it up to low free space) and should be beneficial because it makes the filesystem a bit more resilient to failures (if something in one copy of the metadata corrupts, there’s the other one to rely on).

Article Description:

Basically this is something I learned about int the BTRFS Matrix channels when I asked for help in fixing checksum errors with btrfs scrub, it turned out I had to delete the files with checksum problems and get them back (they were ostree-related files, but I never figured if the reson they corrupted was hardware or software), but the users on the channels hinted at having the dup metadata for avoiding future problems (and also because apparently if metadata gets corrupted, you are screwed).

So, basically, I found this also while searching for it:

The DUP profile for metadata became the default as of btrfs-progs 5.15, so if you create one with a previous version it likely is still the single profile (IIRC the btrfs-progs version will match the kernel version, so we could check the Fedora versions likely to be affected by checking the shipped kernel).

From Btrfs RAID profiles | Forza's Ramblings those seem very important:

With btrfs-progs 5.14 and earlier the defaults were different on flash based media such as NVMe and SSD and normal rotational HDDs. The default was DUP metadata on HDDs and SINGLE metadata on NVMe and SSD. It was changed because it increases reliability and resiliency of the filesystem.

DUP means duplicate. This ensures two copies exists on the same disk. Can be used on one or several drives like SINGLE mode but does not protect against disk failures.

**** DUP mode protects against data or metadata corruption, but not disk failures.***

Always use a redundant profile such as DUP or RAID1 for metadata, even if you use SINGLE or RAID0 for data. This protects the filesystem from many types of otherwise irreparable damage if there were some corruptions in the metadata.

Also the article will be targeted for users of the Single profile for Data, as any other profiles likely was manually chosen by the user and we could assume that they “likely know what they are doing”, although they could likely reuse it if applicable for their system.

The article would end up not being much different than the Manjaro blog post above as we would need:

Explain why older btrfs filesystems are not in the dup profile (i.e. created with pre-btrfs-progs 5.15)
Explain why they should be, what are the benefits
Show the command to check the profiles
Give the disclaimer that the process is very low risks, but the user should be sure that backups are working and that there’s enough disk space (since it will have an extra copy of Metadata and System, a bit over their combined size as free space)
Show the command to actually upgrade the profile

If possible it would be nice to have someone with more knowledge of BTRFS to give some feedback on the article before it’s published.

glb · December 7, 2023, 6:19pm

@chrismurphy Do you think something like this would be good to publish on Fedora Magazine?

Thanks.

chrismurphy · December 9, 2023, 11:54pm

It’s a good question. In some sense it’s a roll of the dice because while DUP is a better default for the reasons cited, the conversion isn’t without risk. The risk is really small. But it’s a pretty high penalty to get snared by a problem. I’m confident I could help the person recover but it’s still a tedious thing.

So I’d point out, no matter the risk, please backup important data. Also, I can’t really assess a particular person’s chance of a one off media error that would mean they’re better off with DUP metadata (because now the file system can self heal) versus the chance they end up with a wedged in file system that they now need to ask for help fixing.

It’s one of those things where, yes it’s a better default, but it might be splitting hairs (even odds) if it’s worth converting.

I have converted all of my file systems, zero problems. It’s a guess but it might be 1 in 7000 people who do end up with a problem. So… yeah. Not zero.

chrismurphy · December 9, 2023, 11:57pm

Also, anyone having a problem with convert is a bug. But also it’s not necessarily a reproducible bug. The problem with file system is the older they get, the more non-deterministic they get. So even though there’s millions of Btrfs tests happening every kernel release by many different folks, the bulk of tests with new kernels are automated tests. And the automated tests don’t create aged file systems that have a healthy amount of non-deterministic characteristics.

glb · December 10, 2023, 12:27am

My feeling, FWIW, is that if a person really wants that sort of data resiliency/redundancy, they should go all-out and add a second (ideally identical) drive and configure full RAID1 mirroring of both the data and metadata. There are performance benifits of RAID1 as well since a good filesystem driver can perform sector reads in parallel across the drives (and typical disk access is read-intensive). Do you know if Btrfs’ support for RAID1 has improved or likely will in the near future? If so, DUP might be unnecessary and perhaps even a bit of a performance negative for people who have multiple drives. I have mixed feelings about this article proposal as well.

mateusrodcosta · December 10, 2023, 7:57pm

Then maybe it would be good enough to just have it as a post in the forum?

Well, we could make an article before this one bout how to handle possible btrfs errors.

Afaik btrfs has two tools to check for errors:

btrfs scrub which verifies the files checksums
btrfs check (previously btrfsck) which checks for structural problems in the filesystem

Explaining btrfs scrubis quite simple actually, you start running it (it can be run online, that is with the fs mounted) and it will verify the checksums.
If some checksums mismatches appear it can either repair with a valid copy stored somewhere else (likely a copy from RAID, not sure if from subvols also work) or do nothing (if running on read-only or if no backup copy.

In the case of not having a backup copy so btrfs scrub automatically fixes it, it should be possible to find the affected files and delete them. This could cause the file to be permanently lost, in case there’s no way to re-create or retrieve it, but in case the file can be re-generated or re-downloaded, could cause almost no issues.

For example:

Most recently I had a help bug where my system went read-only for random reasons, in a few of those I figured out that btrfs scrub showed me errors related to ostree files.
To delete those ostree files I had not only to delete the files from the fs, run ostree fsck and also runostree pull as a way to get the metadata of a few past commits so rpm-ostree could work properly.

btrfs check is a bit more complicated, as everything you read about it tell you to use it as a last resource.
Still, it should be possible to run btrfs check --readonly to check for any problems without doing any changes.
IT’S RECOMMENDED TO ONLY USE THE REPAIR FLAG (–repair) IF RECOMMENDED BY A BTRFS DEVELOPER, the rationale being that this could be a potentially destructive action and can lead to data loss. EVEN WHEN USING THE REPAIR FLAG IT’S STILL RECOMMENDED TO HAVE BACKUP OF ANY IMPORTANT DATA.

I actually did have usage of btrfs check recently.
I didn’t actually know but apparently the inode for a specific folder indside the data folder for Discord was borked. This effectivelly meant I was unable to completely delete that directory.

So, I only learned of it via btrfs check and, after checking if this kind of error was safe enough to use btrfs check --repair, it did seem so.

Not sure how easy to explain that this seems to be the BTRFS procedure to deal with errors would be though.

glb · December 10, 2023, 8:11pm

An article about how to analyze and troubleshoot Btrfs filesystems would be awesome. +1 to that.

Then maybe it would be good enough to just have it as a post in the forum?

Quick Docs would probably be a better place than this forum for something like this.

chrismurphy · December 12, 2023, 2:37am

I think it’s good enough for now. I’m not opposed to an article for more broad advocacy, but it’s probably not possible to over emphasize backups.

Note also that Btrfs is doing a kind of passive scrub all the time, on every read. In normal operation it never permits corrupt data to get to user space. Btrfs issues EIO for any read detecting a corrupt data block. How this IO error is handled is up to the application requesting the data (it’s not always well handled).

The block group profile needs to be DUP (two copies on the same device) or raid1 (two or more copies each on different devices) or higher raid level.

scrub will check all metadata and data blocks, recomputing checksum and comparing against stored checksum. It doesn’t verify the correctness of the file system.

check does a consistency check, i.e. is the file system logically consistent, it’s only a metadata check, (user) data is not checks at all.

–readonly is the default and it’s safe; --repair should be safe but it still comes with warnings to ask an experienced user or a developer

The safer options to discuss for recovery are the various rescue mount options. The file system will be mounted read-only, but if the rescue option works, it’s possible to copy data out normally without needing specialized recovery tools. So at least the backup can be refreshed before attempting risky repairs.

I like rescue=all because it’s easy to remember. The one catch is it includes idatacsums which disables data checksum verification. It is possible with this flag enabled, that corrupt data can be replicated into the backup - so any backup made with idatacsums needs to be embargoed, and proven to be OK. But usually in disaster recovery, you want to get everything you can, even if some of it might have corruption. Nevertheless, important to mention.

It’s possible to enable all except idatacsums but you need to check the man page and pass all the options other than idatacsums

system · June 4, 2024, 5:58pm

This topic was automatically closed after 180 days. New replies are no longer allowed.