I had a busy weekend. Seems that the best way to get motivated to work on some of the older projects is to be left with a lot of time on your hands. In this case, I was recovering from an operation (nothing major) so I wasn’t allowed to do much.
So I took a stab at [iceshelf], an old project designed to make the most of Amazon’s [glacier] backup service. As some of you may know, this is essentially a tape-backup-as-a-service (tbaas! … no? ok). Cost of storing data is extremely low and you can even choose in which AWS data center you want your data saved. Which is handy when you’re trying to plan for the impeding doom after the latest election. Not a good idea to store data close to the US.
But! All is not cheap, any kind of modification of the data or retrieval is quite expensive. Which makes sense since we’re talking about tape, not SSD or HDD. So in short, great to store lots of data which you never intend to change or retrieve. Now, the latter is something you hopefully never will have to do, and the first can be avoided if you plan accordingly.
This is where [iceshelf] comes in. It’s a incremental backup system designed to produce archive files which hold the changes of your file contents. So in other words, if a file changes, it will store a new copy instead of modifying the existing one.
It also tries to accomodate the post-Snowden world by adding both encryption and signature to the archived data using OpenGPG. [Iceshelf] even goes as far as to try and compensate for any potential durability issue by adding parity files which can compensate for a certain amount of data corruption.
Alright already, I hear you cry, get on with it, how do I use this tool?
Before we begin, please note that this guide assumes you’re running on Ubuntu and that you have root access.
With that out of the way, let’s get started.
Prerequisites
First, you need to install all the necessary dependencies
sudo apt-get install gpg par2 git python-dev python-pip
This will install the following tools
-
- OpenGPG
- should be pre-installed, but adding it won’t hurt
-
- PAR2
- tool which can create, verify and repair using parity files
-
- Git
- to retreive [iceshelf] and [amazon-glacier-cmd-interface]
-
- Python development environment and PIP
- required for OpenGPG wrapper we’re going to install
Next up, we need to install the python wrapper for OpenGPG. Unfortunately, there are two versions of this, so it’s very important that you install the correct one, since the incorrect variant lacks functionality that iceshelf needs.
sudo pip install python-gnupg
Last one to be installed is the glacier command for uploading/downloading files from a Amazon Glacier vault. This is a bit more involved since there is no ready made package yet (to my knowledge).
You’re going to download the source in order to build it, so you’ll want to be in a folder where you’re OK with creating additional directories, I used the home directory of my normal user.
Now, let’s download it and install it
git clone https://github.com/uskudnik/amazon-glacier-cmd-interface.git
cd amazon-glacier-cmd-interface/
sudo python setup.py install
(You can delete the downloaded folder now if you wish)
Installing iceshelf
Start with adding a new user who will run the backup. I chose iceshelf
but as you can see, it leads to some fun path names, but other than that, has no impact. However, feel free to change it to something else, just make sure to follow through with all relevant changes.
sudo adduser --system --disabled-password iceshelf
This sets up a new user without password so noone can login as that user. Now, having said that, we’re going to “login” as the user by virtue of changing our identity.
sudo sudo -Hu iceshelf bash
While it might look like a copy-n-paste error, this changes you to root, who then executes bash as iceshelf while also making sure the HOME
variable points to /home/iceshelf
. The benefit is that we now appear as the user iceshelf
and that our home directory is changed to /home/iceshelf
.
As iceshelf, we will now proceed to download the tool to the home folder
cd
git clone https://github.com/mrworf/iceshelf.git
Next, iceshelf also needs a gpg key
~$ gpg --gen-key
gpg (GnuPG) 1.4.16; Copyright (C) 2013 Free Software Foundation, Inc.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
gpg: directory `/home/iceshelf/.gnupg' created
gpg: new configuration file `/home/iceshelf/.gnupg/gpg.conf' created
gpg: WARNING: options in `/home/iceshelf/.gnupg/gpg.conf' are not yet active during this run
gpg: keyring `/home/iceshelf/.gnupg/secring.gpg' created
gpg: keyring `/home/iceshelf/.gnupg/pubring.gpg' created
Please select what kind of key you want:
(1) RSA and RSA (default)
(2) DSA and Elgamal
(3) DSA (sign only)
(4) RSA (sign only)
Your selection?
We’re going to choose 1
Your selection? 1
RSA keys may be between 1024 and 4096 bits long.
What keysize do you want? (2048) 4096
Requested keysize is 4096 bits
Please specify how long the key should be valid.
0 = key does not expire
<n> = key expires in n days
<n>w = key expires in n weeks
<n>m = key expires in n months
<n>y = key expires in n years
Key is valid for? (0)
I don’t want my backup key to expire, so we simply hit enter, which gives us the default of zero, no expiration.
Key is valid for? (0)
Key does not expire at all
Is this correct? (y/N)
It sure is, so answer y
and continue
Is this correct? (y/N) y
You need a user ID to identify your key; the software constructs the user ID
from the Real Name, Comment and Email Address in this form:
"Heinrich Heine (Der Dichter) <heinrichh@duesseldorf.de>"
This next couple of questions are up to you, the only thing you MUST change is the email, since it does not make any sense to use mine.
Real name: Iceshelf
Email address: iceshelf@sensenet.nu
Comment: Automated backup
You selected this USER-ID:
"Iceshelf (Automated backup) <iceshelf@sensenet.nu>"
Change (N)ame, (C)omment, (E)mail or (O)kay/(Q)uit?
Almost done, we need to o
kay this and continue
Change (N)ame, (C)omment, (E)mail or (O)kay/(Q)uit? o
You need a Passphrase to protect your secret key.
You may or may not get a message saying gpg: gpg-agent is not available in this session
which is fine, if you get the message it just means gpg was unable to use fancy graphical UI to ask you for a passphrase.
In my case, I chose no passphrase (simply hitting enter) but if you want additional security, you can add something here.
JUST MAKE SURE TO REMEMBER IT
You don't want a passphrase - this is probably a *bad* idea!
I will do it anyway. You can change your passphrase at any time,
using this program with the option "--edit-key".
We need to generate a lot of random bytes. It is a good idea to perform
some other action (type on the keyboard, move the mouse, utilize the
disks) during the prime generation; this gives the random number
generator a better chance to gain enough entropy.
............+++++
...+++++
We need to generate a lot of random bytes. It is a good idea to perform
some other action (type on the keyboard, move the mouse, utilize the
disks) during the prime generation; this gives the random number
generator a better chance to gain enough entropy.
............+++++
+++++
gpg: /home/iceshelf/.gnupg/trustdb.gpg: trustdb created
gpg: key xxxxxxxx marked as ultimately trusted
public and secret key created and signed.
gpg: checking the trustdb
gpg: 3 marginal(s) needed, 1 complete(s) needed, PGP trust model
gpg: depth: 0 valid: 1 signed: 0 trust: 0-, 0q, 0n, 0m, 0f, 1u
pub 4096R/xxxxxxxx 2016-11-15
Key fingerprint = xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx
uid Iceshelf (Automated backup) <iceshelf@sensenet.nu>
sub 4096R/xxxxxxxx 2016-11-15
Congratulations, you’ve just created a key that can be used for encrypting your backup as well as signing the files associated with it (nevermind the x:es in the text above, I didn’t really feel like sharing those details).
Of course, now you might want to make a backup of this key. One way is to exporting the private key and saving it somewhere safe
gpg --export-private-key iceshelf@sensenet.nu >secret-key.gpg
which is probably not a bad idea. But if you’re paranoid enough to encrypt and sign, you’re probably also aware that USB keys loose data and so do SSDs ([bitflip]), harddrives with spinning platters will fail due to mechanical issues or simply gum up if you let it sit without power. CD-Rs are fairly good if you burn with 1x speed so the pattern is clear, but even they can fail.
So what are you supposed to do? Well, lets turn to paper. Yes, of all easily available material, paper is the one most likely to survive the longest as long as it’s kept reasonably dry and out of sunlight. I mean, they still find parchment paper and we have books which are 100+ years old. Just saying.
To accomplish a paper backup of your key, there are a few options available to you. You have paperback, paperkey and more.
There is also a little helper in the other
folder of iceshelf, where you’ll find a script called analog-key.sh
which can aid you with producing a printable key. The same script will also parse and reconstitute back into a gpg key.
But whatever you decide to do, please make a backup of the key, because once it’s lost, so are your backups.
Creating a config
A good starting point for your configuration is the example conf file, iceshelf.sample.conf
, which is why we’re starting with that as a baseline
To keep things simple, we’ll stay as the iceshelf user (check using id
to see what user you’re currently logged in as). If you logged out previously, fret not, just issue
sudo sudo -Hu iceshelf bash
And you’ll be back as mr iceshelf.
Now, let’s copy the configuration and open it up so we can edit it
cd
cp iceshelf/iceshelf.sample.conf iceshelf.conf
nano -w iceshelf.conf
What to backup?
Locate the [sources]
section and get ready to add the files and/or directories you’d like to back up. Remember, all paths should be entered in the absolute form, meaning that you provide every directory starting from root. So if you’d like to backup the iceshelf user, you’d add this:
iceshelf=/home/iceshelf
The iceshelf
part before the equal sign is just for your reference. Just keep in mind that it must be unique or only the last version of it will be used. The part after the equal sign is the directory or file you want to have backed up. As you can see, we wrote /home/iceshelf
since that’s the absolute path of the user’s home directory.
This was just an example, and while you can backup iceshelf with iceshelf, there rarely is a need for it and I would highly recommend taking some time to consider which folders and files you want backed up.
While it may seem like a great idea to backup the key itself, it’s pointless since we’re going to encrypt the backup with it making it impossible to retreive if lost.
Setting some good options
There’s quite a few options to tweak your backup, so I’m just going to highlight the good ones here and you can explore the rest in the sample configuration’s documentation or the README.md file. Most of the defaults are sensible and will not require review for this inital setup.
detect move: no
This is the default behavior, which for most people is fine. What it means in reality is that if you have a file called IMAGE.JPG
and you renamed it to MONALISA.JPG
, iceshelf would record two changes, the deletion of IMAGE.JPG
and the addition of MONALISA.JPG
. While this is correct, it has one obvious drawback, if you have a tendency to move or rename a lot of the content you’re backing up, it will generate an unnecessary amount of extra data since each move or rename essentially duplicates the file.
By changing this option to yes
detect move: yes
Iceshelf will now do its best to detect when you move or rename a file. When it detects the change, it will simply record the change, and not add a new version of the same file. It will do this for as many times as you move or rename the file, always tracking the original and just storing the change of name/location. However, the moment you modify the file, it will upload a new copy of the modified file.
Why is this behavior mearly mentioned and not enabled by default? Because it can be difficult to understand the sideeffects. Any move/rename with this option enabled will result in a single entry in the manifest.json file. It also complicates restoration of backup, since you now have to manually rename or move the file into place. But if you have a lot of files which move around, this is by far the most efficient way of dealing with it.
But I would suggest testing it out first so you understand how it works.
Another handy option is
skip empty: no
Again, by default, it’s not enabled. But when it’s enabled, it will avoid creating a backup if there are no changes. What it means is that with skip empty set to no, even if there is no changes, a manifest.json will be created. This manifest is of course empty. Why would you want this? To be consistent, to know that the backup was made but no changes were detected.
Setting it to yes
skip empty: yes
will make Iceshelf cancel the backup if no changes are detected. Benefit? No empty manifest files and no unnecessary uploads to Amazon Glacier.
Security settings
First thing we’re going to do is add your newly created key, since mine is called iceshelf@sensenet.nu
, we’re adding it to both encryption and signature. Find the [security]
section and add the following
encrypt: iceshelf@sensenet.nu
sign: iceshelf@sensenet.nu
This will make sure that we can validate that no-one has tampered with the backup (or more likely, confirm it’s not corrupt) and also obfuscate the data so no-one but me can access it. Since I didn’t put in a passphrase for the key, I’m safely ignoring the encrypt phrase
and sign phrase
.
I’m also turning on a 5% parity, allowing up to 5% of the contents to be damaged
add parity: 5
This will of course grow the backup by 5% so keep this in mind if you decide to go up in percentages. And really, while 100% sounds cool, it isn’t really a good idea. If you’re this worried about data loss, you’re better off making multiple backups in multiple locations. Typically 5% is enough and I’d be hesitant to go above 15%.
Glacier
Last step is setting up glacier, which you do under the [glacier]
section. It’s quite simple, you just point out the glacier configuration file and a suitable name for your vault. I’m not going to go into details about setting up glacier, that is something I recommend you check out on the home page of the tool iceshelf uses.
For me, once configured, I simple add the following:
config: /home/iceshelf/glacier.conf
vault: iceshelf-backup
No glacier?
While the tool was created to allow backup using the Amazon Glacier storage, it will function just as well without glacier. To disable the glacier integration, you can simply comment out or remove the entries under the [glacier]
section, and iceshelf will simply skip that step and move the finished backup into the previously defined done dir
.
Trying it out
Now that you have a config file for iceshelf, it’s time to try it out. Make sure you’re still running as iceshelf
, otherwise look in the previous section on instructions for changing user.
The easiest way to try it, is to run in --changes
mode, which will do all but create a backup
/home/iceshelf/iceshelf/iceshelf --changes /home/iceshelf/iceshelf.conf
Depending on your configuration, this may take a while, but in the end, this is what you should see
max size is limited to 32GB when using parity, changing setting accordingly
First run, no previous checksums
Setting up the prep directory
Checking sources for changes
Processing "iceshelf" (/home/iceshelf)
Detected changes:
"/home/iceshelf/iceshelf/exclusions/dovecot.excl" is new
"/home/iceshelf/iceshelf/testsuite/README.md" is new
"/home/iceshelf/iceshelf/.git/hooks/pre-applypatch.sample" is new
"/home/iceshelf/iceshelf/.git/hooks/pre-rebase.sample" is new
"/home/iceshelf/.bash_history" is new
"/home/iceshelf/iceshelf/testsuite/test.sh" is new
"/home/iceshelf/iceshelf/TODO.md" is new
"/home/iceshelf/.gnupg/trustdb.gpg" is new
"/home/iceshelf/iceshelf/.git/info/exclude" is new
"/home/iceshelf/iceshelf/.git/hooks/pre-push.sample" is new
"/home/iceshelf/iceshelf/other/README.md" is new
"/home/iceshelf/iceshelf/.git/HEAD" is new
"/home/iceshelf/iceshelf/modules/configuration.py" is new
"/home/iceshelf/iceshelf/modules/glacier.py" is new
"/home/iceshelf/iceshelf/testsuite/.gitignore" is new
"/home/iceshelf/iceshelf/modules/fileutils.py" is new
"/home/iceshelf/iceshelf/exclusions/README.md" is new
"/home/iceshelf/iceshelf/.git/description" is new
"/home/iceshelf/iceshelf/iceshelf" is new
"/home/iceshelf/iceshelf/modules/configuration.pyc" is new
"/home/iceshelf/iceshelf/.git/hooks/commit-msg.sample" is new
"/home/iceshelf/iceshelf/modules/helper.py" is new
"/home/iceshelf/iceshelf/.gitignore" is new
"/home/iceshelf/iceshelf/modules/glacier.pyc" is new
"/home/iceshelf/iceshelf/.git/logs/refs/heads/master" is new
"/home/iceshelf/iceshelf/.git/hooks/prepare-commit-msg.sample" is new
"/home/iceshelf/iceshelf/.git/config" is new
"/home/iceshelf/iceshelf/.git/logs/HEAD" is new
"/home/iceshelf/iceshelf/modules/__init__.pyc" is new
"/home/iceshelf/iceshelf/DATABASE.md" is new
"/home/iceshelf/iceshelf/LICENSE" is new
"/home/iceshelf/.gnupg/secring.gpg" is new
"/home/iceshelf/iceshelf/.git/hooks/pre-commit.sample" is new
"/home/iceshelf/.gnupg/pubring.gpg~" is new
"/home/iceshelf/iceshelf/other/analog-key.sh" is new
"/home/iceshelf/iceshelf/.git/packed-refs" is new
"/home/iceshelf/iceshelf/.git/objects/pack/pack-f30d8d9756cc8bb8c962fe6cf8ccaf6889c02edf.pack" is new
"/home/iceshelf/.gnupg/pubring.gpg" is new
"/home/iceshelf/iceshelf/modules/fileutils.pyc" is new
"/home/iceshelf/.gnupg/random_seed" is new
"/home/iceshelf/iceshelf/modules/helper.pyc" is new
"/home/iceshelf/iceshelf/.git/hooks/update.sample" is new
"/home/iceshelf/iceshelf/.git/refs/remotes/origin/HEAD" is new
"/home/iceshelf/iceshelf/.git/hooks/applypatch-msg.sample" is new
"/home/iceshelf/iceshelf/.git/refs/heads/master" is new
"/home/iceshelf/iceshelf/.git/index" is new
"/home/iceshelf/iceshelf/.git/objects/pack/pack-f30d8d9756cc8bb8c962fe6cf8ccaf6889c02edf.idx" is new
"/home/iceshelf/iceshelf.conf" is new
"/home/iceshelf/.lesshst" is new
"/home/iceshelf/iceshelf/.git/logs/refs/remotes/origin/HEAD" is new
"/home/iceshelf/iceshelf/.git/hooks/post-update.sample" is new
"/home/iceshelf/.gnupg/gpg.conf" is new
"/home/iceshelf/iceshelf/iceshelf.sample.conf" is new
"/home/iceshelf/iceshelf/modules/__init__.py" is new
"/home/iceshelf/iceshelf/README.md" is new
"/home/iceshelf/iceshelf/other/iceshelf.service" is new
===============
56 files (305K) to be backed up
One thing you’ll realize immediately is that this backup contains a lot of files we didn’t want. Which is why it’s a good idea to do a “dry run” first so you don’t backup a bunch of unnecessary files.
Back to the configuration file, it’s time to learn about [exclude]
and what it can do for you.
Excluding files
If you read through the configuration file, you might find it daunting in terms of what exclude can do. But in reality, it’s fairly simple. By telling iceshelf which files, directories or file sizes you’re not interested in, it will filter those out during backup. In our case, we want to get rid of any .git
folder and the .gnupg
folder.
nogit=?/.git/
nogpg=?/.gnupg/
Another test run and
"/home/iceshelf/iceshelf/exclusions/dovecot.excl" is new
"/home/iceshelf/iceshelf/testsuite/README.md" is new
"/home/iceshelf/.bash_history" is new
"/home/iceshelf/iceshelf/testsuite/test.sh" is new
"/home/iceshelf/iceshelf/TODO.md" is new
"/home/iceshelf/iceshelf/other/README.md" is new
"/home/iceshelf/iceshelf/modules/configuration.py" is new
"/home/iceshelf/iceshelf/modules/glacier.py" is new
"/home/iceshelf/iceshelf/testsuite/.gitignore" is new
"/home/iceshelf/iceshelf/modules/fileutils.py" is new
"/home/iceshelf/iceshelf/exclusions/README.md" is new
"/home/iceshelf/iceshelf/iceshelf" is new
"/home/iceshelf/iceshelf/modules/configuration.pyc" is new
"/home/iceshelf/iceshelf/modules/helper.py" is new
"/home/iceshelf/iceshelf/.gitignore" is new
"/home/iceshelf/iceshelf/modules/glacier.pyc" is new
"/home/iceshelf/iceshelf/modules/__init__.pyc" is new
"/home/iceshelf/iceshelf/DATABASE.md" is new
"/home/iceshelf/iceshelf/LICENSE" is new
"/home/iceshelf/iceshelf/other/analog-key.sh" is new
"/home/iceshelf/iceshelf/modules/fileutils.pyc" is new
"/home/iceshelf/iceshelf/modules/helper.pyc" is new
"/home/iceshelf/iceshelf.conf" is new
"/home/iceshelf/.lesshst" is new
"/home/iceshelf/iceshelf/iceshelf.sample.conf" is new
"/home/iceshelf/iceshelf/modules/__init__.py" is new
"/home/iceshelf/iceshelf/README.md" is new
"/home/iceshelf/iceshelf/other/iceshelf.service" is new
is going to be backed up now. Much better, don’t you think? But we can do better. Let’s get rid of all *.pyc
files, we also don’t want the .bash_history
or .lesshst
files.
nopyc=*.pyc
nobash=/home/iceshelf/.bash_history
noless=/home/iceshelf/.lesshst
And we end up with
"/home/iceshelf/iceshelf/DATABASE.md" is new
"/home/iceshelf/iceshelf/LICENSE" is new
"/home/iceshelf/iceshelf/testsuite/README.md" is new
"/home/iceshelf/iceshelf/modules/fileutils.py" is new
"/home/iceshelf/iceshelf/exclusions/README.md" is new
"/home/iceshelf/iceshelf.conf" is new
"/home/iceshelf/iceshelf/exclusions/dovecot.excl" is new
"/home/iceshelf/iceshelf/other/analog-key.sh" is new
"/home/iceshelf/iceshelf/iceshelf" is new
"/home/iceshelf/iceshelf/testsuite/test.sh" is new
"/home/iceshelf/iceshelf/TODO.md" is new
"/home/iceshelf/iceshelf/modules/helper.py" is new
"/home/iceshelf/iceshelf/.gitignore" is new
"/home/iceshelf/iceshelf/modules/glacier.py" is new
"/home/iceshelf/iceshelf/other/README.md" is new
"/home/iceshelf/iceshelf/iceshelf.sample.conf" is new
"/home/iceshelf/iceshelf/modules/configuration.py" is new
"/home/iceshelf/iceshelf/modules/__init__.py" is new
"/home/iceshelf/iceshelf/README.md" is new
"/home/iceshelf/iceshelf/testsuite/.gitignore" is new
"/home/iceshelf/iceshelf/other/iceshelf.service" is new
Alright, we’re good to go.
Now, we could just run the tool manually, like so
/home/iceshelf/iceshelf/iceshelf /home/iceshelf/iceshelf.conf
But that seems a bit inconvenient, there has to be a better way. There is!
Using cronjob
If you look under the other
folder, you’ll find a file called iceshelf-cronjob
. This file contains a script which can be called from your crontab (the scheduler which is responsible for running certain tools at certain times). if you’ve followed the guide this far and stuck to the naming conventions used here for the configuration files and user setup, then you can simply drop the script into the /etc/cron.daily/
folder (or /etc/cron.weekly/
if you want it to run once a week instead).
But wait! If you copy the file, you won’t get any updates, so instead of copying the file into /etc/cron.daily/
, let’s symlink it so it automatically gets updated when you refresh the git repository. If you’re still running as user iceshelf
(check using id
command) then it’s time to exit by pressing CTRL-D
.
Once you’re ready, let’s make the symlink
sudo ln -s /home/iceshelf/iceshelf/other/iceshelf-cronjob /etc/cron.daily/
That was it, the cronjob will now run every day using the /home/iceshelf/iceshelf.conf
file and save any output to /var/log/iceshelf.log
and also send you an email with the result (if you have configured email for your server).
Congratulations, you now have a complete iceshelf setup which is set to run every day.
If you run into issues or have question, feel free to post them on the github.com page