How to protect your data

Hi to all, I would like to achieve this goal:

  1. Keep important data backed up
  2. Keep important data private

While the first one is rather simple to obtain (i.e. using closed cloud services) I wonder how to get the second one at the same time.

Now I’m using a stacked cryptographic file system, that is encfs, for encryption. Then I upload my files through a client to the cloud service. It works fine for me. But both encfs and ecryptfs(an alternative I considered) seems a little bit outdated projects.
I can’t run my own server in order to use nextcloud. I think I can’t use LUKS if i don’t want to upload an entire disk every time.
So, here the question…is there a more modern, smart way to do this?
Any advice would be appreciate.
Many thanks in advance. Paolo

Personally I would add:
3. Maintain physical control of your own hardware.
But I’m a dinosaur.

1 Like

Have you considered simply using ‘tar’ and ‘gpg’ to create encrypted archives of the files you want to back up. They’d be portable and secure, no matter where you put them. So long as you keep a physical backup of your gpg keys on hand you’ll always be able to decrypt them.

It’s not an elegant, all-in-one solution, but it can be pretty reliable, and is fairly easy to set up and automate. Plus you can use the baked in compression to save space on uploads.

This probably isn’t exactly the solution you’re looking for, but it’s a solution.
I just happen to be a fan of simple command automation.

1 Like

@jtkleeme it would be great to have a NAS but it is a little bit expensive for me.
@cptgraywolf never considered to use tar. I read it supports incremental backup also. But here comes the problem. I’m using right now rclone (based on rsync) and ecryptfs. I can obtain partial and crypted backups. Can I do this using tar? In any case an intresting alternative. I also prefer using simple tools, you know “do one thing well”

I use xdelta3 to get incremental backups. I.e., I store encrypted vcdiff blobs from xdelta3, “xdelta3 -s previous-backup-unencrypted-not-compressed.tar <new_backup.tar | encryption-step >new-backup.vcdiff.encrypted” So, just a toolchain of tar + xdelta3 + something which can encrypt. I would personally recommend not using gpg for file encryption, since it does a lot of things and none of them well (IMVHO). This idea doesn’t fit straight into your current setup, but perhaps it can give new ideas.

@steinar We are on the way…suggestion about encryption method?

How much data are you talking about? If it is just for backups then I would consider tape. You maintain physical control of your data and can put it in a fire proof vault for instance a safety deposit box works well. Tape is usually guaranteed to last for 30 years or more in a stable environment. I used to work with tape all of the time. You can pick up a refurbished LTO 5 tape drive on Ebay for a few hundred dollars. They are now up to LTO8 which will hold 12TB.
Of course I’m used to working with PB scale NAS systems from a few different manufactures as well. I also really liked working with infortrend raids for little jobs.
It is hard to give you advice without more knowledge about how much data you are talking about and the sorts of things you are protecting against. The nature of the data is also important. Are you talking about large contiguous files in excess of a TB or millions of tiny ones in the 4k range? How well does the data compress? Seismic data for instance actually gets larger if you try to run compression and hogs processing power because it is highly random in nature. Other types of data like audio, video, text, user information etc compress very well. Do you have any legal or company guidelines that you are required to follow?

Well, ‘tar’ is only capable of creating archives, but that’s where ‘gpg’ comes in. You can use it to encrypt just about anything (not just files, but anything on stdio), as well as doing secure signatures to ensure the files haven’t been tampered with.

Once you create an archive, you can run it through ‘gpg’ for whatever sort of encryption you want. (I’ve used ‘gpg’ successfully in the past, though it’s been awhile.)

Sorry, @pmozzati , I have no recommendations for what to use for encryption. After I got fed up with using GnuPG (when I could fit a small file in the space I used for all the switches necessary for using gpg I gave up :wink: , it was IMHO way too easy to use gpg wrong), I ended up writing my own thin wrapper around OpenSSL using the PyCa Python cryptography module, and I cannot in good faith recommend something which has had no code reviews nor security reviews and has exactly one user and one developer on the planet (and also assumes the system it runs on is trusted). (For reference if you want to roll your own: )

1 Like

@jtkleeme I just want a simple solution for my own file. They are surely important cause I store medical informations, billings and so on. But we are talking about a small amount of files.

@steinar and cptgraywolf… I think the common steps you indicated me is something like:
get incremental backup > compress > encrypt > upload
I don’t know exactly how to encrypt right now. I will look further for a solution to this point. Thank you.

1 Like

If you use ‘tar’ to create your backups then you can do the backup and compression in a single step (I believe the easiest options are ‘z’ for gzip, ‘j’ for bz2, and ‘J’ for xz).
That would save that step: backup & compression > encrypt > upload

Again, I haven’t used ‘gpg’ in a while but it should be something like: "tar cvzf - <thing to backup> | gpg --symmetric --cypher-algol AES256 --sign -o <name of backup>.tar.gz.gpg"
Or something like that.
Encryption is usually the problem, but you really only need a way to handle simple symmetric encryption. As @steinar showed, you could use some sort of custom program to make SSL calls, but that seems like a lot of extra work (Though I could probably help whip something up in C if it came to that). This is what ‘gpg’ is designed for, and you really only have to figure it out once to automate it.
I don’t see anything inherently wrong with steinar’s ‘ppvtcrypt’ option, but my Python’s not great and I’m not any sort of authority in cryptographic design.

1 Like

Btw, if you switch to AES256 or something else supporting more than 160 bits key in GPG, remember to update the KDF as well, for instance --s2k-digest-algo SHA256 The default for GPG is SHA-1. No point in having a long key if you throw away most of the entropy. As I mentioned, I think GPG is hard to use safely.

If all you’re doing is storing files to a remote server, it’d be nice to be able to generate, store, and re-use the key directly in the same way that you can generate and store an asymmetric key pair, rather than having to use a passphrase every time.

I know ‘gpg’ caches the passphrase, but having a sort of keyring for symmetric keys for use in cases just like this would greatly simplify things.
Generate a key of x length (either at random, or using some form of PBKDF) once and reuse it for each file. You could treat it like a passphrase when encrypting from it, but it would (in theory) be far more secure. It would also give you a known length for your final key, that way you know what algorithms you can use for the KDF.
Maybe you could even link it with a user along with their asymmetric key pair, that way you could know exactly what key to use by default for signing as well.

Kind of an odd use case, but would be nice.

I do something conceptually very close to that for my own backups. I have an lzipped, encrypted file containing symmetric key for my backup, and the key for that file is in my normal password safe. So it all is automatic as long as my password safe is unlocked. (The key for the backups is also backed up in the password vault, but I have a two-step process for integrity and paranoia.) I have realized you quickly end up re-implementing standard CMS like key encryption keys and envelopes in homebrews, though. :smiling_face:

For a mostly out-of-the-box experience (why reinvent the wheel?), I recommend BorgBackup (available in Fedora repo), for a number of reasons:

  • stable and actively maintained
  • de-duplication for keeping backup sizes down and making each subsequent “full” backup very quick (i.e. < 5 min depending on number of changes. no need for partial/incremental/etc)
  • encryption built-in as an option (recommended)
  • various compression options built-in
  • works over SSH

The downsides include a bit of a learning curve, and a lack of multithreading support. The latter is in the roadmap, but don’t hold your breath.

For automating borg, I highly recommend borgmatic (also in Fedora repos).

  • very actively maintained with an extremely responsive developer who is receptive to feedback and suggestions
  • makes it easier and faster to use, automate, schedule, and get notifications of borg backups

Depending on where you want to store your backups and how large they are, you can choose from some cloud backup providers who support borg, or copy to your own cloud solution, or (preferably) use at least one local backup copy.

I use an Odroid HC2 running openmediavault as a simple and inexpensive NAS for the backups, plus occasionally copy the backup repo to another HDD with rsync. This setup has been working great for me.

1 Like

Well in that case put it on a quality external drive and put it in a safe deposit box or fire proof safe. If you buy two and switch them out every week or month or whatever you will always have one backup even if you get into an accident on your way there if it is offsite. Encrypt the whole drive and then you don’t have to encrypt the files individually but you could. Since it is never transmitted over the internet that eliminates all of those potential attack vectors. If it is in your physical possession or a trusted physical location nobody can hold your data hostage on the internet. VERY simple. All for the cost of two usb drives and maybe a safety deposit box or a fireproof safe. Everybody should have a fireproof safe.
This is the sort of methodology that has been used for ultra secure data storage for decades if not centuries. Only the technology has changed. I hope this helps. Most people today don’t give a damn about their personal data. The fact that you value your own data speaks volumes to me.
I personally don’t trust cloud storage. There are about a thousand good reasons why but I’m a dinosaur that asks silly questions like, “What happens when it rains?” Does the company that you rely on for cloud storage have any sort of guarantee that your data is safe? How about a guarantee that they won’t copy it? Any guarantee that they will be in business next week? I’m one of those crazies that tells people to not put personal information on Facebook too.

Not having time to thank you all for your contribute yet. The solution of steinar and cptgraywolf seems fine to me. Now I have to study how to efficiently realize that.
@fasulia I looked at borg and it seems an intresting project. But I like to have control on any step of the process. I’m not that expert but using more simple tools in a toolchain seems more secure to me.
@jtkleeme I really would consider to have at least one external disk for backing up my data. (I do not have a FB account :slight_smile:). Best regards. Paolo

I would like to point out a quick ‘gotcha’ as far as using compressing with ‘tar’. Using bz2 or xz can be unreasonably slow, depending on the size and complexity of the data.

You can use environmental variables to set the compression levels, or the ‘-I’ option in ‘tar’ to specify, perhaps, a strictly multithreaded version of the compression program (like pigz, pbzip2, or pxz). Either way, it’s something to keep in mind if speed of creating/extracting the archives is a concern.

There may be other caveats as well, but they shouldn’t be too big of a deal.
Good luck. I’d actually be interested to see what you end up with. (Maybe write the script and add it to some form of git repo?)

I’m not a good coders, but I will try to do something. Thanks

And that is why the more secure option is code reuse. Unless there are very unique or very simple circumstances, chances are existing tools will do a better, quicker, and more robust job.