Long-term off-line backup

What’s the best means of backing up important documents for the long term?

I’m thinking of things that I don’t expect to need for years if ever but must keep.

I could store them on my daily driver machine, which I backup regularly. But I would prefer off-line storage to avoid cluttering my machine with files I will likely never access and to protect the files from being hacked.

So what should I do? I’m thinking of things like buying to HDD’s, and keeping copies of each important document on each HDD. Then perhaps I would read each file, say, once a year, to ensure it’s not been corrupted.

But there must be a better way…?

That’s essentially what I do. I have two sets of identical thumb drives (four total). Each set contains a mirrored and encrypted ZFS filesystem.

Once per year, I mount the drives (zpool import <zfs-pool-name>) and then run zpool scrub <zfs-pool-name> on each pair of mirrored drives. When the scrub is finished, zpool status <zfs-pool-name> will report if any bitrot was detected and if it was able to automatically repair the rot from the other mirrored drive.

I then update the contents on my primary set with any new data (zfs mount -l <zfs-pool-name1>/<filesystem>, etc.), snapshot it with zfs snapshot <zfs-pool-name1>/<filesystem>@<current-yymmdd> and use zfs send --raw -i @<previous-yymmdd> <zfs-pool-name1>/<filesystem>@<current-yymmdd> | zfs receive -v <zfs-pool-name2>/<filesystem>. Then I export the ZFS pools (zpool export <zfs-pool-name>) and store them in separate safes for another year.

This system will detect and repair any bitrot that might occur. Also, there are four complete copies of the data and it can all be recovered as long as at least one of the drives is intact. I’ve been doing it this way for about 10 years.

2 Likes

I copy important files/directories to a “data” partition. Periodically, I create a compressed and encrypted archive of the “data” partition using fsarchiver to store archives in a “backup data” partition. Copying the “backup data” partition can be used to create a refreshed version.

That’s what I do:

  • Main computer has my mainly-used files
  • NAS on LAN has 1 HDD
  • Spare HDD that’s a clone of NAS (offline/sitting on a shelf unpowered)

Seems simple, but worked fine for years :smiley:

My keep it simple thing I do… :upside_down_face:

Home 🡇
     Desktop 🡇
          ⭐ My Data Folder 🡇 (Backup this whole folder daily to another SSD 💾)
                            Folder X 🡇
                                     Data Files                        
                            Folder Y 🡇
                                     Data Files                        
     Documents (Empty...)
     Downloads (Empty....)
     Music (Empty...)
     Pictures 🡇 (Empty...)
              Screenshots (Empty...)
     Public (Empty...)
     Templates (Empty...)
     Videos (Empty...)

All my files are in 1 place… I’ve moved between a few operating systems this way… App settings are restored manually later… Maybe there is a better way, but… :upside_down_face:

Personally, I use the following setup:
Data on a single hard drive SSD [1] > Backup to two drives in RAID 1 (Btrfs) [2] > Backup to a USB drive [3]

For data backups I use Pika Backup (Borg).

  1. /dev/sdc on /media/emanu/dati type btrfs (rw,noatime,seclabel,compress=zstd:3,ssd,discard=async,space_cache=v2,autodefrag,subvolid=256,subvol=/@dati)

  2. /dev/sdb1 on /media/emanu/hdd_pool type btrfs (rw,noatime,seclabel,compress=zstd:3,space_cache=v2,autodefrag,subvolid=5,subvol=/)

  3. /dev/sde1 on /media/emanu/backup2T type btrfs (rw,noatime,seclabel,compress=zstd:10,space_cache=v2,autodefrag,subvolid=256,subvol=/@backup)

M-disk is best, followed by gold DVDs.
No yearly fuddling needed.

1 Like

Use rsnapshot (quite mature tool with little adjustments as of today: hourly, daily, weekly backups, to retain about 6 months of changes) and have critical data on xfs file systems, and so are my backups on xfs with one exception: one drive is ext4. Additional, I have an rsync script doing additional backups with rsync & sha1 of the same docs, just in the unlikely case rsnapshot fails. Also, I have it weekly (monthly might be sufficien) in my calendar to check if the backups work, just to exclude that the automation of jobs is somehow corrupted. I have a total of 5 (physically) different disks.

I also have btrfs of my system disk and some non-critical data with btrfs snapshots that are automated, but that is more for convenience to backup if something fails or breaks, not part of my critical backup strategy. I consider on the long term to replace the ext4 with something btrfs based, but not yet sure if/when to do this.

I also have some things, non-binary stuff, in daily-auto-updated git repos, which are then also rsync’ed to the backups, to have the git to retain former states, and rsync to put the data to somewhere else as elaborated above.

Backups are isolated from the user account, so that the user account cannot break them.

That avoids most single point of failures (tools, file systems, hardware/drives, automated-call-of-tools, accidental-user-deletion, capture/accidents on user account). Ignored/accepted risk: taking over of root account, or very special super kernel bugs :classic_smiley:

Obviously, as you suggest, retaining some of that offline/detached is an option for you to mitigate also these risks → you can, like me when I take my laptop off the non-mobile backup equipment, mount file systems with nofail , and make the backup scripts dependent on the backup drives being mounted. That way, you do not need to do much, and if the system is booted with (some of) the devices attached, they will auto-mount and then auto-backup. Otherwise, not.

Implications:

  • rsnapshot is super efficient and quick if no large files that change are contained, as inodes allow that nothing but file system meta data is “hardlinked” if a file does not change. But this can become a mess if large files that change are contained in rsnapshot backups: e.g., if just 1 bit of a 10 GB file changes, the whole file is re-backed up, with all 10 GB. This means 3 consequent back ups with each 1 bit changed of the 10 GB file, leads to the use of 40 GB in total.
  • git should not be used on binary data or so. Rule of the thumb, if it is human readable, git is ok, if not, it can be slow and inefficient. As reliable and flexible it is, it is not the most efficient tool anyway, as it is originally not intended for backups. But my experience is that the more critical and important something is (or can become), the smaller it usually is (e.g., documents).
  • be careful if you combine the two solutions: if git makes its changes daily, its files (increasing in size) could cause regular backups of rsnapshots, each rsnapshot backup containing all modified git files. Therefore, in all circumstances, you need to ensure the .git folder of the initialized backup repo is excluded from rsnapshot backups (which does not mean that you cannot use rsnapshot for git repo backups → this issue is about git-based backup solutions). E.g., through putting .git to another directory and link it to the backup repo. Then only the link is backed up, which obviously doesn’t change.

Of some stuff I also have a m-disks, but retaining them regularly is a little problematic, so that is more for data both not changing over long periods, and not occurring often. But that might be an option for you too, though I would not solely rely on that (again, it can act as single point of failure).

All of that is available by default on Fedora repositories and maintained.

Maybe some of that is useful, or may serve as incentive to rationalize some risks or so :classic_smiley:

Supplement: if applicable, don’t forget to embed/align your backup strategy into/with your encryption/access strategy :classic_smiley:

1 Like

Not sure this is the best choice, as xfs doesn’t do checksumming and wouldn’t detect bit rot. You may end up with a broken backup and not detect it until you need the backup.

I’d rather backup to a mirrored ZFS pool (with a regular scrub once or twice per year), then replace disks as they get old or start to fail. How you backup doesn’t really matter that much (rsync, rsnapshot, deja-dup using restic, …) for the criteria of keeping your data safe.

Also, zfs allows to easily send snapshots of existing backups (dataset) and send them to a second backup system/location.

xfs is the suggested (critical data) default for most enterprise distributions, incl. the majority of commercial Linux distributions for enterprise use cases. They put either all or at least their critical data partitions on xfs (incl. suse, using btrfs for system partition but suggests to only use xfs for data partitions, and RH suggests xfs in general).

While the checksumming has its advantages in many respects, and looks great on paper, it also adds a lot of complexity to an already-complex low level, which adds likelihoods of failure and bugs, which e.g. btrfs still occasionally has. Also, checksumming on the fs level is not a dependency to ensure data integrity. It is definitely great to have in some circumstances, but not sure if its advantages for backups outweigh its risks if proper tools are used.

There was a long time ago (15-20 years?) one bug in xfs that caused some trouble, but since then, it has proven to be more or less the most reliable file system. That’s why the “big players” who have to guarantee for data safety on the SLA-level always focus on xfs. However, we also have to be clear that a current mainline kernel will not be as reliable as a long term supported kernel that has passed all the testing up to Alma, Rocky, SuSE, RH, etc. and gets only the necessary updates but no new features (incl. with regards to file systems etc) → but that issue applies to all file systems.

the problem with these solutions is that they do not pass the massive testing of the vendors and communities, as they are rejected by the kernel community. It can be argued if it should be added to the kernel or not, but so far, it is not. Effectively, the overall constellation that is deployed here is not officially supported and as such to be considered testing.

There are advantages to ext4 and xfs that trump the use of more complex options, especially for backup servers.

With nvme supporting T10 DIF and SNIA DIX, as part of the base 2.0 nvme spec if I understand correctly, checksumming data at the filesystem level is redundant.

I would rather do both e2edp and encryption (sed) in hardware. My newest laptop is 2021 vintage so predates nvme 2.0. My backup server doesn’t even have a tpm.

I wonder if Fedora supports reporting on e2edp? fio is installed on my F43 laptop. I would love to have this tested out and reported back here;-|


I tried running a test to see if an nvme device support e2edp

nvme id-ns -H /dev/nvme0n1

but Metadata Size is 0 so no-go


I tried running a test to see if an nvme device supported opal

nvme sed discover /dev/nvme0n1
Error: ioctl IOC_OPAL_DISCOVERY failed

I think this means Fedora is built such that opal cannot work even if the ssd supports opal. Bummer. Hopefully I’m wrong.

1 Like

My preferred for things like this is to burn them to DVD/CRDOM. The benefit being that pretty much any OS can read them and if your docs are PDF/TIFG4/PNG, then most OS’s also have reader for the files stored on the DVD/CDROM. Another good thing is that because they are ROM, you have some measure of non-repudiation in the case of things like contracts/legal documents …They last at least 10 years with several decades being the norm and they are considerably more durable than any HDD/SDD/tape.

$ grep -i opal /boot/config-6.17.9-300.fc43.x86_64
CONFIG_BLK_SED_OPAL=y

Seems the kernel does have support enabled.

1 Like

Now I am even more curious.

I’ve started perusing the NVME Integrator’s List to see if I can find a device with both e2edp and sed to purchase.