Custom Database Constraints in Django

julian.a

Posted by julian.a

django

If your goal is to make a reliable and secure web app, you're going to spend some time thinking about data integrity. It's a truism that if your users can put bad data in your database, they will. Django comes with a fair bit of support out of the box for data validation. A ForeignKey will create a database level constraint and the unique_together attribute gives you the ability to add some more complex database level constraints, but often you'll have real world constraints on your data that aren't neatly captured by Django's built in constraints, but could be enforced by your relational database.

While you can (and should!) write model and form validators to catch and handle errors before they become 500 errors, it's a mistake to rely on them to guarantee data integrity. Django may not be the only application modifying your database, and even if it is (and you've got no admins with a tendency towards 'fixing' problems using the database console), it's very hard to really guarantee data integrity with just Django. Model validation provides only very limited guarantees: it works fairly well for data entry via a ModelForm, but validators aren't run on save, or on queryset update, or through any of the other mechanisms Django provides for putting data in the database. Models are updated and modified through dozens of different code paths, and it's just plain bad design to try to catch every instance and make sure validation happens. The database itself is where you should validate data.

Here's the good news: if you're not afraid of writing a line or two of SQL, Django makes it very easy to impose any database constraint you want with just a few minutes work.

So lets suppose we're writing an online liquor store and want to store some user profile information. For legal reasons we'll ask users their date of birth, and only want to allow users who are at least 21 years old to register, and we want to impose a database constraint to guarantee our condition.

We write our usual profile model as usual. Something like:

1
2
3
4
5
6
7
8
9
    # accounts/models.py
    from django.conf import settings
    from django.db import models
    

    class UserProfile(models.Model):
        user = models.ForeignKey(settings.AUTH_USER_MODEL)
        date_of_birth = models.DateField(validators=[validate_over_twenty_one])
        ...

Note that we're still doing model validation so our ModelForms will work smoothly. Next, we generate our migrations; the operations section should looks something like:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
    # accounts/migrations/0001_initial.py
    ...
        operations = [
            migrations.CreateModel(
                name='UserProfile',
                fields=[
                    ('id', models.AutoField(auto_created=True, primary_key=True, serialize=False, verbose_name='ID')),
                    ('date_of_birth', models.DateField(validators=[accounts.models.validate_over_twenty_one])),
                    ('user', models.ForeignKey(on_delete=django.db.models.deletion.CASCADE, to=settings.AUTH_USER_MODEL)),
                ],
            ),
        ]

We add our custom SQL after the CreateModel operation:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
    # accounts/migrations/0001_initial.py
    ...
        operations = [
            ...
            migrations.RunSQL(
                """
                ALTER TABLE accounts_userprofile ADD CONSTRAINT is_over_21 CHECK (
                    date_of_birth <= now() - interval '21 years'
                );
                """,
                """
                ALTER TABLE accounts_userprofile DROP CONSTRAINT is_over_21;
                """
            ),
        ]

Now we run our migrations and we're done! All it took was 10 lines of extra code, and now we can be completely confident that no user profiles with an age under 21 years can be created. Note that we could write this as a separate migration, but by doing it in this migration we ensure that the database is never in a state where bad data can get in. For this particular example I've used some postgres specific syntax just because it's what I'm familiar with, but something similar should work for most databases. This particular example is a little contrived to keep it simple: having an account with an age under 21 years isn't really an issue with data integrity, but situations where real data integrity concerns exist come up constantly, and this basic technique works most of the time.

The beautiful part about this approach is that Django doesn't need to know anything about the constraint. If you try to add bad data, you'll get an IntegrityError, but that's better than running into a confusing bug down the line when the bad data bubbles back to the surface. If you can anticipate the error, you can catch and handle it cleanly, and if you miss the error, at least you just get a 500 error for the relevant request. Once you get used to doing this, adding a constraint is 15 minutes of work, and makes debugging and reasoning about your code much easier.

There's a natural tendency when working in a framework, especially a 'batteries included' framework like Django, to avoid doing anything outside the framework itself. Often that's a good idea. But I would argue that part of the power of Django is that it provides easy low level access to your database when you need it. Django may eventually support more sophisticated constraints out of the box. As I write this there's an 6 year old bug in the Django bug tracker to "Add the ability to use database-level CHECK CONSTRAINTS" that switched from "Someday/Maybe" to "Accepted" six months ago! But we don't have that support yet, and I suspect that a large part of the reason the Django devs have been so slow to add this feature is that it's already there.

If you want to use more sophisticated constraints in your Django database, you can, and if you're not already, you really should be!

Return to Articles & Guides