Good approach for daily backup of data on Fedora using Btrfs

I’m running Fedora Server 41 on my home server, and I have the following setup:

  1. /mnt/data – an 8TB internal drive (Btrfs filesystem) where I store non-OS related data like Podman container volumes, shared data for SMB users, and other files.
  2. /mnt/backup – an 8TB external USB drive (also Btrfs filesystem) that is always connected and mounted, where I want to back up the data from /mnt/data on a daily basis.

I’m looking for the best approach to automate this backup process and ensure it runs smoothly. Both drives are formatted with Btrfs, so I’m considering using tools like rsync, Btrfs snapshots, or Snapper to handle the backup.

It would be nice to monitor the backup process using Cockpit, but it’s not a must-have requirement.

Just to note, I am new to Fedora. Previously, my home server was running Windows, and I was using the built-in backup tool in Windows to handle my backups.

Could someone suggest the most efficient and reliable method for backing up this data on a daily basis? Also, should I rely on Btrfs snapshots, rsync, or another approach, given that both drives use Btrfs?

Thanks in advance for your suggestions!

I use btrbk to manage backups: https://digint.ch/btrbk/

It’s pretty simple to use: you just set up your backup configuration in a configuration file and then schedule the btrbk script to run periodically (using the task scheduling tool of your choice—the btrbk documentation suggests cron but I use systemd timers and services instead). The documentation at the above link gives a number of example configurations; you’ll probably want to base yours on the “backups to USB disk” example.

You can manage backups for multiple btrfs volumes in one configuration file, and you can have a separate retention policy for each btrfs subvolume if you choose. My own btrbk setup is that I have a systemd timer that takes hourly local snapshots for /, /var, and /var/home (and clears out old snapshots according to the snapshot retention policy), and another systemd service that runs whenever I plug in my external backup SSD (which I don’t normally keep plugged in), transfers local snapshots to the backup, and clears out old backups according to the backup retention policy (e.g. keeping only weekly backups after 30 days, and only monthly backups after 12 weeks).

I’m a backup junkie and I found all Linux GUI backups apps to be lacking. On Windows I use Retrospect that does deduplicated, compressed, encrypted, block level (backs up only the parts of files that changed, not the entire files) and searchable backups with multiple sources, destinations and schedules.

On Linux, I settled on an rsync backup script that mimics Apple’s Time Machine, it allows for most flexibility (the way I see it) and I can search the resulting backups since backups are just directories so you can search them using your file manager:

The above script doesn’t have any retention options though so you just have to delete old backups manually, it’s safe to do so.

There is another script that adds retention options but this one has (had?) issues when the source is on BTRFS volume so I stopped using it:

Both can backup to local volumes and over ssh/sftp. And neither is a BTRFS specific script that takes advantage of BTRFS snapshot features. Just normal rsync.

No compression, no block level backups, and no proper deduplication (though by the nature of the script it’s kind of deduplicated) but most of my files are incompressible anyway. Backups can be encrypted too but it’s way too advanced for me.

You need another script to invoke it and it takes normal rsync options, so you can add an exclusion list for example or add progress and verbose options to write them to a log file or run it silent with no options. To format the backup destination name though you need to edit the script itself, but it’s quite easy. Then you schedule your script with cron to run at times you desire.

The script needs source and destination as basic options, ex: ./timemachine-backup.sh /home/anon/ /backup. Then you can add a file with exclusions and a log option. Like so:

#!/bin/bash
/home/anon/scripts/timemachine-backup.sh /home/anon/ /backup/hourly/ --progress --verbose --exclude-from /home/anon/scripts/excluded-patterns.txt > /home/anon/backuplogs/daily/backup-daily-log-“$( date ‘+%Y-%m-%d–%H.%M.%S’ )”.log

1 Like

Personally, for data backup, I rely on Pika (Borg) because it gives me more flexibility.
You can also create read-only snapshots and send them to the backup, which provides both a rollback option on the active filesystem and a backup on the destination filesystem. However, this requires a more complex setup and subvolume management to include or exclude specific data from the backup.

I rely on Btrfs snapshots for system restoration and for backing up the entire system.

For me anything based on Borg is out since the backups are not searchable. Has that changed?

To find a file you need to know the exact date it existed to mount that particular backup. You can’t just search for a file name across time or quickly restore multiple versions of the same file across several weeks or months.

Otherwise, if you don’t care about your backups being searchable, then Pika is probably the most polished of them all.

Thank you very much for sharing those tools… I am new to Linux so I am not too much familiar with those tools… I am going to look into it… currently this is what I have created, a script that takes snapshots and backs it up on the different (external usb drive)… along with snapshots I am also baking up the data itself… here is what my script looks like…

#!/bin/bash

# Set variables
DATA_DIR="/mnt/data" #8TB internal drive
BACKUP_DIR="/mnt/backup" #8TB external usb drive
SNAPSHOT_DIR="$DATA_DIR/.snapshots"
BACKUP_SNAPSHOT_DIR="$BACKUP_DIR/.snapshots"
DATE=$(date +%Y-%m-%d)
SNAPSHOT_NAME="snapshot_$DATE"
RETENTION_DAYS=10

# Ensure snapshot directories exist
sudo mkdir -p "$SNAPSHOT_DIR"
sudo mkdir -p "$BACKUP_SNAPSHOT_DIR"

# Create a new snapshot on /mnt/data
sudo btrfs subvolume snapshot "$DATA_DIR" "$SNAPSHOT_DIR/$SNAPSHOT_NAME"

# Copy the snapshot to /mnt/backup using btrfs send/receive
sudo btrfs send "$SNAPSHOT_DIR/$SNAPSHOT_NAME" | sudo btrfs receive "$BACKUP_SNAPSHOT_DIR"

# Sync live data to /mnt/backup
sudo rsync -av --delete "$DATA_DIR/" "$BACKUP_DIR/"

# Clean up snapshots older than $RETENTION_DAYS (e.g., 10 days)
find "$SNAPSHOT_DIR" -maxdepth 1 -type d -name 'snapshot_*' -mtime +"$RETENTION_DAYS" -exec sudo btrfs subvolume delete {} \;
find "$BACKUP_SNAPSHOT_DIR" -maxdepth 1 -type d -name 'snapshot_*' -mtime +"$RETENTION_DAYS" -exec sudo btrfs subvolume delete {} \;

# Done
echo "Backup and snapshot cleanup complete!"

Using Cron, I have scheduled it to run everynight at 2AM.

Do you guys see any major issue with this approch?

Sorry, I’m a Linux noob too and I’m not familiar with BTRFS snapshots yet (I postponed BTRFS stuff for some unspecified time in the future to learn), so I can’t comment on that. Basic rsync suits my needs for now.

This looks intresting… I think I am going to explore this utility… Thanks for suggestion…

+1 for btrbk, which leverages btrfs snapshots and very cheap incremental replication to another local or remote btrfs.

In particular, I like that each replicated snapshot on the remote btrfs is complete. While the send stream is incremental, the receive function starts by snapshotting the most recent snapshot - and then the send stream is “applied” to this new snapshot thereby “catching it up to date”.

This means you don’t need to depend on special tools for restoring from your backup. The most recently replicated read-only snapshot is a complete representation of the state of the (source or parent) subvolume. You can treat it like a subdirectory and either graphically copy files out, or rsync whole or partial directories, or single files.

Because of the nature of Btrfs file b-trees, much of the work is done during normal use to make it very cheap for btrfs to locate changed filed. Therefore deep traversal of every directory and file is not required for btrfs to know which files have changed between generations. Often deep traversal is required by other utilities, on either the source or destination side of the backup process - and sometimes even both sides. Btrfs requires no deep traversal on either side. So it’s especially useful in the case where there are many files and directories but few changes.

2 Likes

Btrfs snapshots work fine for local backups (as does rsync with hardlinks if you do not have a Btrfs filesystem). However, if you want to have a 3-2-1 backup strategy, with one backup off-site, things become more complicated because you then need some kind of filesystem access on a remote system. While this is not impossible (a server system from a hosting provider with SSH access will do nicely), it is more involved than, say, some S3 compatible object storage, which is purpose built for remote, API-accessible data storage without having to manage your own server. (Note, there is https://rsync.net, which offers a similar, rsync-compatible data storage service. I have not looked into this in more detail as there are far more alternatives for object storage and thus more competition in this space.)

I use restic to snapshot my data onto a second drive locally and then copy (new) snapshots to a Backblaze B2 bucket every night (I am not affiliated with Backblaze, it is just what I settled on after comparing it to S3 and the other two big cloud storage players). You can of course combine Btrfs snapshots with restic for an off-site backup but using a single tool for both allowed me to simplify my setup.

Snapshots in restic are also very efficient (not as efficient as copy-on-write ones, of course) for slowly changing data sets. Data is deduplicated by splitting files into chunks and a snapshot only stores new chunks and references older, unmodified chunks. This makes creation fast (so you can snapshot often) and small (only new or changed data needs to be stored and transferred to an off-site location).

Also note, there are other, very similar backup tools like Borg Backup (command line tool, like restic) or duplicati (.Net based, with a web GUI). Years ago, I chose restic because it fit my use case best and I have not had any problems with it that would have made me switch to something else.

If you’re in EU, Hetzner offers low priced “Storage Box”. Though they do have data centers in the US and in Singapore, but I’m not sure about pricing outside of EU.

It’s just a chunk of space accessible via sftp, not a full virtual host. It’s like €5/month for 1TB. They also have 5TB and 10TB plans.

If ever figure out how to do encrypted backups using the time machine rsync script or maybe learn about restic, this is what I’d do.

I ended up going with several of the previously mentioned tools for my backup strategy.

I use btrbk to create daily snapshots of root and home and then it sends those to another drive as backups.

I also use restic to backup my most important files to Backblaze B2 daily.

Then finally I occasionally use CloneZilla to keep a complete image on a USB drive that I keep offsite.