I've now finally gotten around to adding PostgreSQL support to Adamanteus! It wasn't really difficult, just hard to find time to do with everything that's been going on lately (quite busy at work and with the new house). The usage of Adamanteus hasn't changed at all, now you can just specify 'postgres' as your backend rather than just 'mongodb' or 'mysql'.
There is one slight caveat, however: the pg_dump utility does not allow you to provide a password non-interactively. For the time being, at least, that means that if you're using Adamanteus to back up a PostgreSQL database you can't specify a password (it will throw an exception if you do). The solution to this is to either 1) set up a read-only passwordless user for running backups, or 2) set up a .pgpass file on the machine from which you intend to run your backups (documentation here). I recommend the .pgpass option, it's quick and easy.
Adamanteus 0.5 is available from both bitbucket and PyPi.
Backing up databases is one of those things that I've always felt could be done in a better way. Traditionally I've done it with a simple shell script that used mysqldump or pg_dump to dump my database to an SQL file named using a timestamp, compress it, and maybe scp it off to some remote server for redundancy. This approach works just fine, except that I recently took a look at my backup directory for a project using that setup only to discover that there were nearly 5000 backup files taking up 11 GB (and this is using bzip2 to compress them!). Obviously not an optimal situation, especially considering that really very little changes from backup to backup, and it's quite possible that nothing changes at all for some of them. It simply makes no sense to store an entire dump of your database every single time!
Fortunately, this is a very familiar situation that we've got advanced tools to handle: version control systems. So I decided to write a little program to replace my shell script that would use a modern, advanced version control system to provide a much more reasonable solution. What I came up with was Adamanteus, a command line program written in Python that allows you to back up your database into a mercurial repository. It currently supports MongoDB and MySQL, and I plan on adding PostgreSQL support this weekend.
Using Mercurial immediately solves basically all the problems with my original approach. It stores diffs rather than full files, meaning you aren't wasting space with a lot of duplicate information. It also handles compression transparently keeping the file sizes down even for the diffs. Plus, because Mercurial is a distributed version control system it's very easy to provide redundancy by pushing and pulling to and from remote repositories. (Pushing/pulling to/from remote repositories isn't currently implemented, but that's also in my plans for this weekend.)
The project is far from complete, but I think it's sufficiently far developed to release as 0.1. Plans for the 1.0 release include:
- PostgreSQL support
- The ability to restore your database from a particular revision in the repository
- automated cloning/pushing/pulling of the repository
- Integration with Django as a management command
I think this is actually pretty close and it probably won't take too long for me to implement all of those, so hopefully I'll be able to push out a 1.0 release very soon. The one other issue holding up 1.0 is that I'd like to wait for MongoDB 1.5 which will bring
mongoxport functionality in line with
mongodump which is what I'm currently using. The issue here is that mongodump produces binary data files which don't play quite as nice with version control and lose you the advantage of only storing diffs. Mongoexport will export JSON or CSV files, which
will allow it to take full advantage of Mercurial,
but until 1.5 there's no easy way to use mongoexport to dump all the collections in a database which is the default behavior for mongodump.
Anyway, I'm definitely looking forward to some feedback on this project, as I suspect it could be quite useful to many people. Contributions are always welcome as well!
I just saw this very interesting (for a programmer) blog entry on Storing IP addresses as integers. As the article says, anyone who wants to store IP addresses in a database is generally going to do as a string. However if you're really concerned about memory efficiency, you might want to find a lighter data type to use. Since an IP address is really just a collection of integers it would make sense to store it as an integer, however doing so without losing important information (specifically, the placements of the dots) becomes an issue.
The blog post in question introduces the the pack() and unpack() functions, which 'allow you to create and extract data into and out of binary-packed strings' and provides the Ruby code necessary to encode an IP address as a simple integer and decode it back to a dotted quad. I thought this was pretty cool, so I decided to write the equivalent Python code. This is what I ended up with:
It seems to work as advertised, though the packed string the encode() function returns is different from what Ruby gives you, so they won't interoperate (it would probably only require changing the pack() and unpack() formats to do so, but I didn't feel like experimenting to figure out which ones).
Of course, being a Django guy, the immediate application of these functions that I see is creating a new IPAddressField that stores the address as an integer instead of a string transparently.
As you probably know, I've been working on a Django-based re-build of BostonChefs.com (the new version of which is actually live now, but due to DNS propagation issues isn't yet available to 100% of people which is why I haven't yet written a post about it). Among other things, BostonChefs.com provides information on some of the fantastic restaurants in the Boston area. One piece of information it provides is the hours of operation of those restaurants. In order to store this information I created a model called HoursOfOperation. It looks like this:
class HoursOfOperation(models.Model):
DAY_CHOICES = (
('0', 'Sun'),
('1', 'Mon'),
('2', 'Tue'),
('3', 'Wed'),
('4', 'Thur'),
('5', 'Fri'),
('6', 'Sat'),
)
restaurant = models.ForeignKey("Restaurant")
meal_period = models.ForeignKey("MealPeriod")
day = models.CharField(max_length=3, choices=DAY_CHOICES)
open_time = models.TimeField(default=datetime.datetime.now)
close_time = models.TimeField(default=datetime.datetime.now)
def _get_hours(self):
return "%s - %s" % (self.open_time.strftime('%I:%M%p'), self.close_time.strftime('%I:%M %p'))
hours = property(_get_hours)
As you can see, each 'hour' is related to a restaurant and a meal period, which allows us to display the information in a manner similar to that you might find on a store's front sign. For example, if you go to the Grill 23 & Bar page (my personal favorite restaurant in Boston, although Craigie on Main is a decent challenger), you'll see something like this:
DINNER
* Sun: 5:30 p.m.-10 p.m.
* Mon-Thur: 5:30 p.m.-10:30 p.m.
* Fri: 5:30 p.m.-11 p.m.
* Sat: 5 p.m.-11 p.m.
Building a list like that out of the above model proved slightly more difficult that I might have hoped. It required quite a lot of template logic, including writing a custom filter. The block of template code necessary to generate that list looks like this:
<div class="hours">
{% regroup restaurant.hoursofoperation_set.all by meal_period as periods %}
{% for period in periods %}
<div class="hoursMealPeriod">{{ period.grouper }}</div>
{% regroup period.list by hours as hour_list %}
<ul>
{% for hour in hour_list %}
<li>{{ hour.list|collapsedays }}</li>
{% endfor %}
</ul>
{% endfor %}
</div>
As you can see, somewhat complex. Those nested {% regroup %}s can be nasty to wrap your head around, if nothing else. But basically it's taking the set of HoursOfOperation objects related to the restaurant, grouping them by meal period, then taking the subset of those objects for each meal period, and grouping those by the hours of the day they represent. So what you're then left with is a list of all the different time periods (still represented as HoursOfOperation objects) that the restaurant is open for a given meal period, and the days on which it is open during those hours. As you can see above, the days are represented by number of the day of the week (0 for Sunday through 6 for Saturday).
Converting that list integers into something like 'Mon, Wed-Fri' was not very easy, and certainly not something I wanted to try to tackle using Django's template tags. I ended up drawing heavily on my hazy memories of CS 127 (many thanks to Dave who taught me all about recursion way back then) and creating a filter that considers the list of HoursOfOperation objects as a list of those integers, then recursively converts it into a list of lists representing the subsets of contiguous days in the list. So if you start out with [1, 3, 4, 5] you end up with [[1, 1], [3, 5]] which is then converted into 'Mon, Wed-Fri'. After several false starts I ended up with this beauty of a Django template filter:
from django.template import Library
from django.template.defaultfilters import time
from types import ListType
register = Library()
def simplify(index, found, days):
high = index+1
mid = index
low = index-1
if not found:
days[low] = [days[low], days[low]]
if high >= len(days):
if not isinstance(days[-1], ListType):
if days[-1] == days[-2][1]:
days.pop(-1)
else:
days[-1] = [days[-1], days[-1]]
return days
if int(days[high].day) - int(days[mid].day) == 1 and (found or int(days[mid].day) - int(days[low][0].day) == 1):
days[low][1] = days[high]
days.pop(mid)
high = high-1
found = True
else:
if found:
days.pop(mid)
found = False
return simplify(high, found, days)
@register.filter
def collapsedays(value):
hours = "%s-%s" % (time(value[0].open_time), time(value[0].close_time))
days = simplify(1, False, value)
for i in range(len(days)):
if days[i][0] == days[i][1]:
days[i] = days[i][0].get_day_display()
else:
days[i] = "%s-%s" % (days[i][0].get_day_display(),
days[i][1].get_day_display())
return "%s: %s" % (', '.join(days), hours)
View Comments
Tags:
boston,
bostonchefs,
code,
django,
filter,
food,
massachusetts,
programming,
python,
restaurants,
template
One of the Django projects that I've been working on for about a year and and will (fingers crossed) be going live in the very very very near future has involved a lot of modification to Django's admin interface. I plan on writing more about the many, many specific changes that I've made to the interface (without modifying the actual Django codebase, so that the changes can be easily applied by anyone without breaking updates), but to talk about them all at once would make far too long of a post. So I'll be taking them on one at a time.
The change I want to talk about right now involves redirecting the user to the page I want them to be at after they've finished adding or editing an object rather than to that object's model's change_list. Normally, if you're editing an Entry object in the Blog app, when you hit save it will take you to /admin/blog/entry/, which is a list of all the Entry objects in the database. However there are some instances in which this isn't the behavior I want. Once such instance involves model inheritance. Say, for example, you have multiple types of Entries which you've accomodated through multi-table-inheritance. Because the different sub-classes are different models, they all have their own change_lists in the Django admin. But I want to be able to view, edit, and create Entries of all types from one page.
Fortunately, Django makes this fairly easy to accomplish. All that is necessary is to override the appropriate methods in the Entry ModelAdmin. That will end up looking something like this:
class EntryAdmin(admin.ModelAdmin):
def change_view(self, request, object_id, extra_context=None):
result = super(EntryAdminAdmin, self).change_view(request, object_id, extra_context)
if not request.POST.has_key('_addanother') and not request.POST.has_key('_continue'):
result['Location'] = iri_to_uri("/admin/blog/entry/")
return result
The exact same modification should be made to add_view as well, and a nearly identical modification to delete_view though it doesn't need to deal with the _addanother and _continue cases. You can then use the EntryAdmin class for all of your varioud Entry sub-classes, or, if you need some other changes to the admin for different Entries sub-classes you can sub-class EntryAdmin for them. Now, whenever you hit the save button after editing any sort of Entry, it will always take you back to /admin/blog/entry/ rather than /admin/blog/linkentry/ or whatever your other subclasses are. If you want it to only take you back to /admin/blog/entry/ if you're coming from a particular page and otherwise take you to /admin/blog/linkentry/ all you need is to add a GET variable to your url (something like '/admin/blog/linkentry/add/?return_to_main=True') and then check for it in your modified change_view, add_view, and delete_view methods with a request.GET.get('return_to_main', False). I've even used this between objects of different model types to create a 'dashboard' page that allows you to view, and alter the relationships between an object of one type with objects of another type. All that's necessary in a case like that is to pass the id of the object in your GET variable and take that into account when creating your uri. An added benefit of that it makes it easy to auto-fill the ForeignKey field when creating a related object. In such a case you'll also need to keep that GET variable in the URLs down the line in order to maintain compatilibility with the 'Save and add another' and 'Save and continue editing' features. But that's still a simple modification:
class EntryAdmin(admin.ModelAdmin):
def change_view(self, request, object_id, extra_context=None):
result = super(EntryAdminAdmin, self).change_view(request, object_id, extra_context)
other_id = request.GET.get("other_id", None)
if not request.POST.has_key('_addanother') and not request.POST.has_key('_continue'):
if other_id:
result['Location'] = iri_to_uri("/admin/dashboard/%s/") % other_id
return result
elif request.POST.has_key('_continue'):
if other_id:
result['Location'] = iri_to_uri("?other_id=%s" % other_id)
return result
elif request.POST.has_key('_addanother'):
if other_id:
result['Location'] = iri_to_uri("%s?other_id=%s" % (result['Location'], other_id))
return result
return result
But more on that sort of thing in other posts.
Django provides a lot of really useful tools to simplify the development process and let you focus only on the important bits. The FileField and ImageField (a subclass of FileField) are good examples of that letting you simply tell Django that your model will have a file or image and letting it take care of the issues of uploading, storing, and all that. In the past, that's really been enough. It will even automatically delete the file/image when you delete its parent model object. One thing it doesn't do, however (and this has been the topic of much debate), is delete a previously uploaded file/image when you upload a new one for the same model. What I mean by this is if you have a model with a FileField defined and use it to upload some file associated with an object. If you then later decide that you want a different file associated with that object and upload it, it doesn't delete the original file and instead leaves it in place and only changes the path stored in the database to point to the new file. Depending on the nature and traffic your website gets, this can lead to massive amounts of storage being wasted on orphaned files (assuming you don't want to keep those old files, of course). I toyed with a couple different approaches to this, including the possibility of subclassing the FileField to try and add that functionalty directly to the field. While this would probably work, I instead opted for a less generalized method: overriding the save() method of the model to take care of this:
def save(self, force_insert=False, force_update=False):
try:
old_obj = ModelName.objects.get(pk=self.pk)
if old_obj.image.path != self.image.path:
path = old_obj.image.path
default_storage.delete(path)
except:
pass
super(ModelName, self).save(force_insert, force_update)
This work perfectly, though it has the disadvantage of being specific to a particular model, which violates the DRY principle (assuming you're going to use it on more than one model). Fortunately there's a simple way to solve this problem too: subclassing models.Model and then instead of having all your models subclass models.Model directly, have them subclass your own version of it instead. In that case you'll probably want to have it work generically on all ImageFields and/or FileFields rather than having to name them all specifically. This isn't too hard, and you can build up a list of all the ImageFields for a particular model like this (taken from the sorl-thumnail project):
for field in model._meta.fields:
if isinstance(field, models.ImageField):
if field.upload_to.find("%") == -1:
paths = paths.union((field.upload_to,))
[Edit: There was a bug in my code that I've corrected. Details are below in the comment by Comete.]
[Edit2: Fixed another problem in the code with variable names. Thanks, again Comete!]