Adamanteus: versioned backups of databases using Python and Mercurial
Mar 26, 2010

Backing up databases is one of those things that I've always felt could be done in a better way. Traditionally I've done it with a simple shell script that used mysqldump or pg_dump to dump my database to an SQL file named using a timestamp, compress it, and maybe scp it off to some remote server for redundancy. This approach works just fine, except that I recently took a look at my backup directory for a project using that setup only to discover that there were nearly 5000 backup files taking up 11 GB (and this is using bzip2 to compress them!). Obviously not an optimal situation, especially considering that really very little changes from backup to backup, and it's quite possible that nothing changes at all for some of them. It simply makes no sense to store an entire dump of your database every single time!

Fortunately, this is a very familiar situation that we've got advanced tools to handle: version control systems. So I decided to write a little program to replace my shell script that would use a modern, advanced version control system to provide a much more reasonable solution. What I came up with was Adamanteus, a command line program written in Python that allows you to back up your database into a mercurial repository. It currently supports MongoDB and MySQL, and I plan on adding PostgreSQL support this weekend.

Using Mercurial immediately solves basically all the problems with my original approach. It stores diffs rather than full files, meaning you aren't wasting space with a lot of duplicate information. It also handles compression transparently keeping the file sizes down even for the diffs. Plus, because Mercurial is a distributed version control system it's very easy to provide redundancy by pushing and pulling to and from remote repositories. (Pushing/pulling to/from remote repositories isn't currently implemented, but that's also in my plans for this weekend.)

The project is far from complete, but I think it's sufficiently far developed to release as 0.1. Plans for the 1.0 release include:

  • PostgreSQL support
  • The ability to restore your database from a particular revision in the repository
  • automated cloning/pushing/pulling of the repository
  • Integration with Django as a management command
I think this is actually pretty close and it probably won't take too long for me to implement all of those, so hopefully I'll be able to push out a 1.0 release very soon. The one other issue holding up 1.0 is that I'd like to wait for MongoDB 1.5 which will bring mongoxport functionality in line with mongodump which is what I'm currently using. The issue here is that mongodump produces binary data files which don't play quite as nice with version control and lose you the advantage of only storing diffs. Mongoexport will export JSON or CSV files, which will allow it to take full advantage of Mercurial, but until 1.5 there's no easy way to use mongoexport to dump all the collections in a database which is the default behavior for mongodump.

Anyway, I'm definitely looking forward to some feedback on this project, as I suspect it could be quite useful to many people. Contributions are always welcome as well!

blog comments powered by Disqus