We hear a lot about backups being neglected, and wrote an article some time ago about why backups are so important. Yet, what can be equally infuriating is a backup policy that is not as useful as it might seem when the times comes to actually use it. This article touches on the pros and cons of different approaches. In particular, we consider the speed to restore a backup, compression to save space and encryption to protected the backed up data.

Speed to restore

If you have a lot of important data – whole server backups, VM snapshots or lots of static contents – videos, images and so on, speed is important. It is not unknown for a backup restore process to take several days in these cases, due to a sub-optimal backup choice. If you have a lot of data, one backup strategy must be the speed and cost of restoring everything in a worst-case scenario. An excellent article suggests why Amazon Glacier is the wrong choice for this.

If you have double-digital gigabytes of data to restore, I strongly recommend against lots of compression or encryption for your backups. If the data is very sensitive, consider backing up to an encrypted disk rather than encrypting the backups directly as it is often faster. Compression and deduplication (i.e. storing incremental changes over time) also deteriorate the speed of your backups to sometimes unusable levels. Finally, consider the geographic distance between the backup storage and your normal server – it should be great enough that it survives a disaster, yet small enough that data doesn’t have to travel far.

Compression, deduplication and encryption

Compression means rewriting files in a more efficient way and has a lot of similarities with deduplication, storing multiple copies of files by only recording changes over time. Compression and deduplication are perfect for accidental deletions as it’s easy to restore a handful of files from months or years ago. In these cases, the speed doesn’t matter much as, without compression and deduplication, you probably would not have been able to store the files for so long anyway.

Encryption is a more complex subject. Platforms that claim to offer encryption often offer encryption that can be decrypted by someone else. This sort of encryption offer negligible security benefits in my opinion. Good encryption needs to be end-to-end, meaning that there is only one feasible way to decrypt the data at any stage – with the relevant key. The downside is, strong, end-to-end encryption can be slow and, combining it with compression can make restores very slow indeed. It also places additional stress on your servers, potentially increasing hosting costs.

For sensitive data, such as e-commerce databases, separate the sensitive data and store it somewhere away from prying eyes. Tarsnap is an excellent choice for this sort of sensitive data. Although the pricing seems fuzzy, at least one of our clients stores data at Tarsnap at a surprisingly low cost.

RAID and Snapshots

One client this week found out the hard way that neither RAID nor snapshots were of any use when the hosting provider had a major incident, resulting in corrupt data, overwriting the snapshot with the corrupt data and eventually telling him that the snapshots shouldn’t be used as backups.. Right.

Snapshots are actually great when they are done properly – a few are stored off-site. They are fast to restore and generally very reliable. However, they only apply to backups being made by a hosting company on a virtual machine that they sell. It is not feasible to back up an entire running system consistently. Therefore, if you rely on your hosting company for backups, you should have a second backup, in case you hosting company get it wrong – as they often do.

Backup Service Providers

With our managed hosting and managed cloud, we take care of the complexities of backups. Other managed service providers generally do the same. I would still urge clients to keep their own backups, particularly for the cases of a dispute between the provider and the client, but, generally, someone else is worrying about it.

For others, on unmanaged services, consider Tarsnap for your most sensitive data. If you use a control panel, such as cPanel, there is already a great backup tool built in, that is specifically designed to work with your server, that just needs some space. The space should be somewhere else, off your server, and, you might be surprised at how cost effective that backup space can be.

