For most people, the importance of having backups of digital files is hard to over-emphasize. Plenty of blog posts such as this and this have gone into depth on the importance of backups that protect against a variety of terrible data-destroying scenarios. There are plenty of things that could happen that truly make it impossible to recover data, such as theft, data corruption, hard drive failure (happened to me twice, I was mostly safe), accidental deletion (happened to me once, just barely saved my butt!), or even, very realistically, a house fire. Some good principles are to have at least 3 copies of any file, on a different medium or hard drive, ideally have one copy in a different geographical location than the others, and I'd say just as important is that it is easy or effortless to actually perform the backup, otherwise laziness can take over until your most recent backup is over four months old.

My needs

My specific needs were to maintain backups of my main Linux laptop that I use for everything - recreation, development, learning, working from home, gaming... everything. I have a 512 GB NVMe M.2 SSD hard drive on which lives my system files and most everything in my home folder that benefits from super fast disk access, and I also have a 2 TB internal old-fashioned hard drive that allows me to cheaply carry with me mountains of files, such as old file backups, photo and recording archives, disk images, operating system .iso images, my music and video libraries, large program resources, virtual machines, and the like. All files that I own anywhere are encrypted at the disk level, so the data is unreadable if my laptop or hard drives are stolen.

My previous solution

Up until now, I've been using rsync to efficiently mirror folders that I care about to an external hard drive, and occasionally, to a remote folder on a computer downstairs. It was better than nothing, but I knew that eventually I'd want to invest in learning a better tool. One bad property of that setup were that nothing was compressed, and I only have so many enormous hard drives to fill up. Another bad thing was that I had no history of backups - every time I ran my backup command, I overwrote the previous backup with current files, and it's entirely possible to not notice destroyed or corrupted data until weeks or months after it happens, at which point it's unrecoverable. A plus was that the backups were incremental - once a huge file or folder was copied, rsync was smart enough not to copy it again. I knew that I wanted to replace my rsync solution with something without its cons, so I started my search months ago.

Finding a better solution

When looking for a good backup program, these comparison charts were a good place to start for me. I was mostly interested in incremental backups more than file synchronization (for which I'd definitely recommend Syncthing), or network-oriented backup programs, since it was just my one computer that I was most concerned about. Of those listed, Borg Backup truly lit up the table with green, so I decided to check it out, and it had everything I was looking for - compression, de-duplication, incremental updates, so on and so forth, check out the website for more. The basic idea is to have a repository, inside of which you can create and manage archives that contain the result of a backup, all compressed with no unnecessary copies, so one can have backups that extend back for months or years without wasting space. One can even mount these archives to the file system and explore their contents from a file explorer! I soon installed Borg on my system and started exploring the documentation, but never actually used it for a few months.

Setting up the script

Tonight, I decided to actually construct a script program to automatically perform backups of everything I cared about keeping safe. Graphical interface programs can be useful and easy, but if one can get a program in a script that runs from the command line, than one can do any number of complicated tasks in something as easy as pressing a shortcut key. I based my script from the one at the bottom of Borg Backup's Quick Start page, with some modifications. I already had a bkexclude file from when I was using rsync that specified all the folders and types of files that I didn't want to backup - mostly caches, development tool repositories that could easily be reinstalled, and temporary files. I was able to use it with an --exclude-from /home/jeff/bin/bkexclude argument to borg create. I also used "repokey" encryption for added protection on a live system (because disk encryption doesn't protect a system from being exploited while it's powered on).

To prevent me from having to type in my password every time, because I was using Gnome 3 as my desktop environment, I saved my password in the GNOME keyring. A keyring is a safe place to put things like passwords because it stays encrypted on disk, and can be unlocked with the computer password, and is already used by programs like Chrome to safely store sensitive data, and is where passwords that you instruct the system to remember go to be saved, and it is well managed by the operating system by default, setup during installation. To use it from my Borg script, I told it to use the secret-tool command (man page here) to lookup the password from the keyring by setting the BORG_PASSCOMMAND variable in my script to something like secret-tool lookup borg-repo passphrase.

Tested and working!

In computers, one of the most satisfying things is to type a command, and watch reams of text fly up a terminal window as the computer goes to work to generally improve the state of things in your life. Now that I know I've found a good program for my backups, and have started keeping them better, I can improve on and expand this system to bring it closer to perfection and make a model of how to do safe and reliable backups for whatever I may happen to do in the future.