Reliable self-healing file system: Btrfs, OpenZFS, or something else?

Hello Fedora Enthusiasts! :grinning:

I am looking for self-healing file system for daily use on desktop and laptop as well as for regular backups on external drive.

My goals:

  1. automatic (on read) or semi-automatic (periodic scans) data corruption detection;
  2. backups and folder synchronization:
    a) easy backups of selected folders from two machines (desktop and laptop) to external drive - in case of the same folders on both computers I want to check file condition and synchronize correct version of files between computers and drive;
    b) folder synchronization between two machines (desktop and laptop), preferably by rsync;
  3. Support for encryption (LUKS?);
  4. Mechanism for easy file repair (based on full or partial backup - just show the folder with copies and system will pick uncorrupted files and use them to repair files on other drive);
  5. easy snapshots for more periodic local backups and file versioning (in case of unintended modification of file);
  6. [nice to have] deduplication to eliminate duplicated data on the same disk;

Originally I decided to give a try to Btrfs for system drive as well as for data storage.

I read the series Working with Btrfs, performed presented hands on exercises, and then read few more articles.

My findings related to Btrfs:

  • fsck is not advised for repairing Btrfs filesystem. Better to do a backup using Live CD;
  • btrfs-restore can be used to recover files without modifying them;
  • “A consistent set of changes. To avoid generating very large amounts of disk activity, btrfs caches changes in RAM for up to 30 seconds” - no more than 30 seconds of changes can be lost (or, in other words, up to 30 seconds of data changes might be lost);
  • copy-on-write (COW) does not guarantee to access any version of modified state - only the ones from created snapshots (user cannot choose file version from arbitrary time);
  • self-healing does not mean corrupted files will be automatically repaired, data corruption is detected on read or on demand (based on stored checksums)! It is stated in official documentation: “Self-healing - checksums for data and metadata, automatic detection of silent data corruptions”;
  • Btrfs is tend to cause file system fragmentation;
  • in comments under article “Choose between Btrfs and LVM-ext4” as well as on Fedora Discussion I found many reports regarding issues with Btrfs;
  • Red Hat Enterprise Linux deprecated Btrfs! “Stratis was first released in Red Hat Enterprise Linux 8.0. It is conceived to fill the gap created when Red Hat deprecated Btrfs. (…) Stratis offers powerful features, but currently lacks certain capabilities of other offerings that it might be compared to, such as Btrfs or ZFS. Most notably, it does not support CRCs with self healing.”
  • Btrfs can turn into read-only in case of error - it might be extremely problematic on system drive, especially on laptop when not in home!

My findings related to OpenZFS:

  • need to be incorporated to Linux kernel manually (not built in);
  • ZFS use more memory and is prone to errors in case of malfunctioning RAM;
  • unlike BTRFS, ZFS does not support dynamic disk pool expansion (cannot add or remove disks from existing pool without destroying and recreating the pool).

Main questions:

  1. Btrfs, despite promising, seems to be not mature enough to store important files or use it on system drive. Am I right? Any thoughts?
  2. I found many positive opinions related to ZFS. Unfortunately, due to its license, OpenZFS is not build in into Linux kernel. It is possible to manually add support for OpenZFS to Fedora. Any known constraints or drawbacks? Can mitigate potential problems if not using ECC RAM?
  3. bcachefs - reasonable alternative to Btrfs and OpenZFS, or not yet?
  4. How to set encrypted OpenZFS partition? Will option for LUKS appear when OpenZFS is installed?

Additional questions:

  1. [Btrfs] How can I repair file(s) based on copies from
    a) different drive with Btfrs?;
    b) different drive with other file system (eg OpenZFS or ext4)?;
    c) different folder (if folder on other drive has different name)?;
  2. [OpenZFS] How can I repair file(s) based on copies from
    a) different drive with OpenZFS?;
    b) different drive with other file system (eg Btrfs or ext4)?;
    c) different folder (if folder on other drive has different name)?
  3. [OpenZFS] Any recommended tools to facilitate work with ZFS on Linux?

Enjoy rest of the day! :grinning:

You should be able to do this with almost any filesystem.

I am not aware of any filesystem that can do this. Honestly, this sounds more like a backup solution than something that would be built into the filesystem.

This is tunable. By default uses a huge chunk of memory but you can set it to whatever you want by setting zfs.zfs_arc_max

I don’t think that Btrfs is more prone to corruption than other filesystems. The primary issue with Btrfs is that when a corruption happens, it is often unrecoverable. This especially true in single disk scenarios.

Yes. OpenZFS provides the needed modules for Fedora.

You need to pay attention to kernel versions as it sometimes take time for the zfs modules to catch up with the latest kernels when breaking changes happen.

If you are talking about restoring files from backup, the filesystem doesn’t matter much here.

I really like znapzend for managing snapshot creation and pruning.

That isn’t true.

ZPOOL-ADD(8)                       System Manager's Manual                      ZPOOL-ADD(8)

NAME
       zpool-add — add vdevs to ZFS storage pool

SYNOPSIS
       zpool add [-fgLnP] [-o property=value] pool vdev…

DESCRIPTION
       Adds  the specified virtual devices to the given pool.  The vdev specification is de‐
       scribed in the Virtual Devices section of zpoolconcepts(7).  The behavior of  the  -f
       option, and the device checks performed are described in the zpool create subcommand.

       -f      Forces  use  of  vdevs,  even  if they appear in use or specify a conflicting
               replication level.  Not all devices can be overridden in this manner.

       -g      Display vdev, GUIDs instead of the normal device names.  These GUIDs  can  be
               used  in  place  of  device names for the zpool detach/offline/remove/replace
               commands.

       -L      Display real paths for vdevs resolving all symbolic links.  This can be  used
               to  look  up  the  current block device name regardless of the /dev/disk path
               used to open it.

       -n      Displays the configuration that would be used  without  actually  adding  the
               vdevs.   The  actual  pool creation can still fail due to insufficient privi‐
               leges or device sharing.

       -P      Display real paths for vdevs instead of only the last component of the  path.
               This can be used in conjunction with the -L flag.

       -o property=value
               Sets the given pool properties.  See the zpoolprops(7) manual page for a list
               of  valid properties that can be set.  The only property supported at the mo‐
               ment is ashift.

EXAMPLES
   Example 1: Adding a Mirror to a ZFS Storage Pool
       The following command adds two mirrored disks to the pool tank, assuming the pool  is
       already made up of two-way mirrors.  The additional space is immediately available to
       any datasets within the pool.
             # zpool add tank mirror sda sdb

   Example 2: Adding Cache Devices to a ZFS Pool
       The following command adds two disks for use as cache devices to a ZFS storage pool:
             # zpool add pool cache sdc sdd

       Once  added, the cache devices gradually fill with content from main memory.  Depend‐
       ing on the size of your cache devices, it could take over an hour for them  to  fill.
       Capacity and reads can be monitored using the iostat subcommand as follows:
             # zpool iostat -v pool 5

SEE ALSO
       zpool-attach(8),      zpool-import(8),      zpool-initialize(8),     zpool-online(8),
       zpool-remove(8)

OpenZFS                                March 16, 2022                           ZPOOL-ADD(8)
ZPOOL-REMOVE(8)                    System Manager's Manual                   ZPOOL-REMOVE(8)

NAME
       zpool-remove — remove devices from ZFS storage pool

SYNOPSIS
       zpool remove [-npw] pool device…
       zpool remove -s pool

DESCRIPTION
       zpool remove [-npw] pool device…
               Removes  the  specified device from the pool.  This command supports removing
               hot spare, cache, log, and both mirrored and non-redundant primary  top-level
               vdevs, including dedup and special vdevs.

               Top-level vdevs can only be removed if the primary pool storage does not con‐
               tain  a  top-level raidz vdev, all top-level vdevs have the same sector size,
               and the keys for all encrypted datasets are loaded.

               Removing a top-level vdev reduces the total amount of space  in  the  storage
               pool.   The specified device will be evacuated by copying all allocated space
               from it to the other devices in the pool.  In this  case,  the  zpool  remove
               command  initiates the removal and returns, while the evacuation continues in
               the background.  The removal progress can be monitored with zpool status.  If
               an I/O error is encountered during the removal process it will be  cancelled.
               The  device_removal  feature flag must be enabled to remove a top-level vdev,
               see zpool-features(7).

               A mirrored top-level device (log or data) can be removed  by  specifying  the
               top-  level  mirror  for  the same.  Non-log devices or data devices that are
               part of a mirrored configuration can be removed using the zpool  detach  com‐
               mand.

               -n      Do  not  actually  perform the removal ("No-op").  Instead, print the
                       estimated amount of memory that will be used by the mapping table af‐
                       ter the removal completes.  This is nonzero only for top-level vdevs.

               -p      Used in conjunction with the -n flag, displays  numbers  as  parsable
                       (exact) values.

               -w      Waits until the removal has completed before returning.

       zpool remove -s pool
               Stops and cancels an in-progress removal of a top-level vdev.

EXAMPLES
   Example 1: Removing a Mirrored top-level (Log or Data) Device
       The following commands remove the mirrored log device mirror-2 and mirrored top-level
       data device mirror-1.

       Given this configuration:
               pool: tank
              state: ONLINE
              scrub: none requested
             config:

                      NAME        STATE     READ WRITE CKSUM
                      tank        ONLINE       0     0     0
                        mirror-0  ONLINE       0     0     0
                          sda     ONLINE       0     0     0
                          sdb     ONLINE       0     0     0
                        mirror-1  ONLINE       0     0     0
                          sdc     ONLINE       0     0     0
                          sdd     ONLINE       0     0     0
                      logs
                        mirror-2  ONLINE       0     0     0
                          sde     ONLINE       0     0     0
                          sdf     ONLINE       0     0     0

       The command to remove the mirrored log mirror-2 is:
             # zpool remove tank mirror-2

       The command to remove the mirrored data mirror-1 is:
             # zpool remove tank mirror-1

SEE ALSO
       zpool-add(8),       zpool-detach(8),      zpool-labelclear(8),      zpool-offline(8),
       zpool-replace(8), zpool-split(8)

OpenZFS                                March 16, 2022                        ZPOOL-REMOVE(8)

The integration of the kernel modules is automated by DKMS. You just need to install the RPM packages and everything is automatic after that. When a new Linux kernel is installed, new ZFS kernel modules will automatically be built and installed (this does significantly increase the time it takes to install a new kernel).

The technologies you are talking about are all quite complex. After you have become familiar with them, they can be used to increase your system’s resilience to failure. However, you might want to be careful about putting anything important on a system that you aren’t yet familiar with and, therefore, might be more likely to corrupt by issuing incorrect commands.

1 Like

That is sort of how ZFS’s self healing works (but it is not based on a “backup”, it is based on another live disk mirror): https://www.youtube.com/watch?v=MsY-BafQgj4&t=1560s

This is a really good point. ZFS, especially, isn’t something you can just setup and run without any research and expect good results. Some of the settings are set once at pool creation and cannot be changed later so it is critical to get those right which means you need to spend time planning in advance.

1 Like

Dear dalto and Gregory Lee Bartholomew,
Thank you for your comments. I find them very helpful! :grinning:

rsync can detect changes between two files based on their checksums, but I need to read both copies of file and do not know which one is the correct one. If corrupted one is recognized on read (should be by COW file system) then I need to copy whole file by rsync while COW file system checksuming chunks of files and should be able to synchronize only corrupted part of file instead of whole file. That is why I asked for file system mechanisms. I want to use rsync mainly (only?) to copy new and updated files.

Good to know. Thank you. :grinning:

Thank you for warning. That is very important!

As far as I know, COW file systems should be able to use copy of a file to replace only corrupted chunks of a file instead of copying whole file. Am I missing something here?

Thank you for recommendation. ZnapZend looks interesting.

That is good news! Seems article I found is not up to date. Thank you for disclaimer! :grinning:

I found this information on OpenZFS webpage too. Thank you for stressing its importance! :slight_smile:

That is interesting. Do I need to do something to ensure ZFS modules will be automatically rebuid? Earlier I found

  1. By default ZFS may be removed by kernel package updates. To lock the kernel version to only ones supported by ZFS to prevent this:
    echo 'zfs' > /etc/dnf/protected.d/zfs.conf
    Fedora — OpenZFS documentation

Fully agree! That is why I presented my findings and questions here. I want to choose the most suitable file system and go deep into it. :grinning:

I will watch it! Thank you. :slight_smile:

A filesystem will be able to do automatic healing of a file in a filesystem if it is has sufficient data to do so.

However, it won’t compare different files or use a second file for healing. Especially if that file is on a different filesystem. Healing won’t reach outside the filesystem for data.

I am not aware of any filesystems using a copy of the file to do this. They can repair using multiple copies of data for the same file but you are talking about actually using different files.

dkms modules will automatically be rebuilt whenever the kernel updates.

Thank you for prompt response, dalto.

So, how to ensure that files will be usable to repair file system? Only the files from subvolume on other physical drive will be usable? Could you please elaborate on this topic?

Nice to know. :grinning:

You need more than one copy of the data. Not by copying it to new files, but via raid. In other words, if you setup a mirror, zfs will have two copies of the data for every file and will be able to fix it if the data on one side of the mirror is corrupted.

1 Like

Why would a file be corrupt? Most likely reason is a hardware failure in the storage device. In which case copying from a backup to the failing disk is not a fix.

You will need to configure RAID with enough disks to get the resilience that you need. Now your recovery when a disk fails is a lot easier you replace the broken disk and let the OS rebuild the RAID.

Beware that desktop systems BIOS and servers systems BIOS handle failing disks in different ways. Desktop systems will often refuse to boot if there is a failing disk reported by S.M.A.R.T. Where as a server system will boot and allow you to handle the failing disk as you wish.

1 Like

Sure. But I need to recognize which copy of a file (on which disk) is corrupted. And in my use case it might be happen by accidental disconnection of external drive (most likely before files copying finished).

Thanks, but RAID it is not what I am looking for. I am looking for a solution to backup files/folders from two machines (laptop and desktop) to one external drive with mechanism to easily detect if file on any location got corrupted.
Seems that without RAID I need to manually replace corrupted file. Hope I will never need to do so.

dalto, Barry A Scott, thank you for your answers. :grinning:

In the case of ZFS, the zpool status command will tell you if any files have been detected as corrupt (and it will list exactly which files are corrupt so you can replace them with known good versions from backups). Also, you should run zpool scrub <pool> from time to time to scan the full filesystem for corrupt files.

$ zpool status
  pool: root
 state: ONLINE
  scan: resilvered 71.5G in 00:07:09 with 0 errors on Mon Jun 12 23:55:59 2023
config:

	NAME        STATE     READ WRITE CKSUM
	root        ONLINE       0     0     0
	  mirror-0  ONLINE       0     0     0
	    sdb2    ONLINE       0     0     0
	    sda2    ONLINE       0     0     0

errors: No known data errors

Just to give an answer to this question as well:

Not yet. It looks interesting and it has the potential to eventually rival btrfs.

However, the current version should be considered more of a tech preview rather than something that you might want to run on anything where your data is even remotely important.

Thanks a lot! :grinning:
Seems that in Btrfs btrfs-check (btrfs-check(8) — BTRFS documentation) is the equivalent of zpool status. Btrfs also has scrub (Scrub — BTRFS documentation).

Thank you for confirming my assumption! :grinning:
Will take a look on bcachefs from time to time.

1 Like

What about encryption in Btrfs and OpenZFS?
Most likely I need to use LUKS for both of them. Does encryption on one or both of these file systems cause higher SSD wear or other significant/potential problems?

For btrfs you should use luks. For zfs you can use the built-in encryption or luks.

There is a known issue with replicating encrypted zfs encrypted snapshots causing corruption but that only impacts you if you are doing that.

For LUKS encryption does not amplify writes, it just changes what is written.

I have been using LUKS encryption for many years on a range of systems and not encountered any problem.

Thank you for answer and warning against issue in ZFS. :grinning:
Btrfs is not error free too. Worth to regularly check stability status of the features Btrfs supports.

I have also been using LUKS for years. With ext4. Also with no problems. Anyway, I encountered linked discussion and started to give some thought if encryption really can cause additional wear or stability issues on some file systems.
Thank you for sharing your opinion and experience. :grinning: