Directories safe to exclude from system backups

What directories are safe to exclude from system backups?

Specifically, is it ok to omit /var/cache/dnf/, which is ~500MB and often changing, and/or ~/.mozilla/firefox/[profile].default/storage/default/ ?

My daily backups end up including a lot of added or modified files in those directories, which might be unnecessary.

I already exclude the following list of directories, as well as any that contains a CACHEDIR.TAG file:

        - /dev/*
        - /proc/*
        - /run/*
        - /sys/*
        - /tmp/*
        - /var/lock/*
        - /var/run/*
        - /var/tmp/*
        - /root/.cache/*
        - /home/*/.cache/*
1 Like

In my opinion, a typical desktop/laptop rarely requires system-level backups.
User-level backup is enough for most cases adding the list of manually installed packages and /etc.
Everything else is easier to restore with package manager unless access to the internet is really problematic.
The most common locations to exclude from user-level backups are:

~/.cache
~/.local/share/Trash

Other locations mostly depend on personal preferences.
You can run rsync --progress to determine what else to exclude.

1 Like

That is quite a common opinion. However, at this time, I don’t wish to have the hassle of deciding which files and directories are necessary since not all programs follow conventions on where to store config files, etc. I especially don’t want unpleasant surprises in the event I need to use the backup.

In any case, this is a divergence from the original question.

1 Like

I think you can safely exclude /var/cache/dnf/ if you don’t have troubles with internet connection as dnf cache can be easily rebuilt any time you need to.

As for storage folder in firefox profile – well it looks like it’s used for websites’ offline data storage as well as for storing data of some of firefox extensions.

It isn’t named as important data to restore in mozilla’s support article.

Here’s people wondering / discussing what’s it for:

https://www.reddit.com/r/firefox/comments/cq3w0d/what_is_the_storage_folder_for_in_my_firefox/

https://groups.google.com/forum/#!topic/mozilla.support.firefox/2ZDLm8qADyQ

1 Like

Thanks. It seems like there’s no clear answer for the firefox storage folder. Sounds like it should mostly be ok to exclude… except maybe not for some “rogue” extensions?

/var/cache/dnf seems clearer, but still nothing definitive? The broader /var/cache seems more murky; I think I read that some files or directory structures might be expected by some programs?

It would be great to have a definitive list of directories that are safe to exclude. Maybe that’s not possible since it depends on other programs following conventions.

Testing a backup/restore could be one way to check, but even that’s not a guarantee of avoiding problems in future/different situations.

Certainly:

$ rpm -q -f /var/cache/* 
dnf-4.2.7-2.fc30.noarch
file /var/cache/fwupd is not owned by any package
gdm-3.32.0-3.fc30.x86_64
file /var/cache/ibus is not owned by any package
glibc-2.29-15.fc30.x86_64
libvirt-daemon-5.1.0-9.fc30.x86_64
libX11-common-1.6.7-1.fc30.noarch
man-db-2.8.4-4.fc30.x86_64
systemd-241-10.git511646b.fc30.x86_64

Does that imply those programs will choke if those directories don’t exist? If they recreate them, that would be fine. But the uncertainty is worrisome.

Completely or partly but surely.

They may not even have enough privileges if they run under a limited user.
And even in they have, they may not expect to do it.
Also there’s ownership, permissions, ACLs, SELinux labels…

That’s it.
When user and his apps respect the FHS, the “inclusive” approach as a basis is simply more efficient.

Spending substantial time on something, I always verify that it is saved and backed up properly and/or document the way to reproduce the result.
This really helps a lot in case of emergencies.

Therefore it may be best to exclude only directories for programs which can be relied upon to respect the standard, which hopefully includes Fedora’s core components such as dnf.

I think I will exclude only /var/cache/dnf/* for now, on this basis, considering also that this cache is almost guaranteed to be stale if recovered, and because it is always externally sourced and not internally generated (such as logs).

Of course, some kind of official confirmation that this is safe would alleviate worries.

I did a test backup/recovery with /var/cache/dnf/* excluded. It seems fine.

There is initially an error with, for example, dnf info -C borgbackup:

[Errno 13] Permission denied: ‘/var/cache/dnf/expired_repos.json’

because that file does not exist. After a dnf upgrade --refresh which repopulates /var/cache/dnf/, the error went away.

For your information, these are the directories I exclude, in addition to what you already listed. I’m not terribly worried about having to recreate an occasional empty directory, so I exclude the entire /var/cache as you can see. For the same reason I eclude the entire temp directories, as I want to exclude any dot-files in them.

This is my personal choice on our systems, you can consider if you want to add any of them to your list.

/media/*
/home/*/.ccache
/home/*/.config/libreoffice/*/user/uno_packages/cache
/home/*/.local/share/Trash
/home/*/.macromedia
/home/*/.mozilla/*/*/storage
/home/*/.mythtv/cache
/var/cache
/var/crash/*
/var/spool/squid
/var/tmp
1 Like

Thanks for sharing your list.

Some directories are more elegantly omitted by the CACHEDIR.TAG file, particularly .ccache, also some program caches, and several directories in /var/cache/man.

Several of the other directories either don’t exist or are empty or are very small on my system.

I’m guessing /media/* is omitted to avoid USB/external devices? Those show up for me on /run/media/* and backup options like borg’s --one-file-system take care of such problems.

Nothing creates a CACHEDIR.TAG in .ccache as far as I know. I wouldn’t like to manually have to add this in the home directory of any user on my systems. (Just a few, but I wouldn’t know if they remove and recreate the directory for some reason. It also feels a bit wrong.) So I find it better to exclude them in the file. I take advantage of any CACHEDIR.TAGs automatically created, but I don’t create such files manually.

When it comes to --one-file-system, it wouldn’t work for me since my backups span more than one file system.

I guess the bottom line is there isn’t really any perfect list here. You have a list that suits you, I have one that suits me. But discussions like these can give us inspiration for improvements, so they can be useful all the same. Thanks for sharing your list!

1 Like

Yes, definitely. The more the merrier.

Hmm it isn’t automatically created for you? Note that on my system they are (automatically) in /home/*/.ccache/*/CACHEDIR.TAG. What does sudo locate CACHEDIR.TAG show?

My backups also span more than one file system. I see a couple of options:

  1. Everything is included by default, and exclusions are specified manually.
  2. Everything (outside main filesystem) is excluded, and inclusions are specified manually.

I took the approach of option 2 on the basis that it is more deterministic. Option 1 depends on systems and programs behaving consistently and following standards for things like where to mount temporary devices/volumes.

1 Like

Just a note, you can also use sudo dnf makecache specifically for this – although any command with --refresh will do too, as you’ve said.

Makecache Command

dnf [options] makecache

Downloads and caches metadata for all known repos. Tries to avoid downloading whenever possible (e.g. when the local metadata hasn’t expired yet or when the metadata timestamp hasn’t changed).

  dnf [options] makecache --timer

Like plain makecache, but instructs DNF to be more resource-aware, meaning it will not do anything if running on battery power and will terminate immediately if it’s too soon after the last successful makecache run (see dnf.conf(5), metadata_timer_sync).

1 Like

Um, looking again, yes it is. Checking the release notes I see it has been that way for a while. I guess I’m revealing the age of my exclusion list, and thereby of me. :grinning:

Thanks for pointing it out, I can simplify my list a little.

2 Likes

I agree. I only include my /home directory with my backup scheme:

rsync --archive --delete --progress --human-readable --exclude-from '/home/username/bin/rsync-exclude-list.txt' ~/ /run/media/username/Backup/home-backup | tee ~/rsync.log

Notes

  • I include my excluded list in a separate file (see below)
  • I use cron to run rsync each evening at 11:30pm
  • I write to a file the outcome of the rsync and as proof the command ran

Exclude list

.cache
.dbus
.gvfs
.local/share/gvfs-metadata
.local/share/Trash
node_modules
Trash-1000
.DS_Store