Making a Django URL Resolver field: A Case Study

nolan

Posted by nolan

django Python

Recently, I was posed with a problem.

A client, using the content management system their Django site was built around, could happily link pages together with the help of a link field provided by the CMS—a trivial thing to do via a simple choice field populated by existing pages. The problem presented itself when the client asked how it was she could link to a page on the site not created by the CMS, i.e. a view Django resolves using a url pattern, for example, www.example.com/admin/ or www.example.com/accounts/1/.

I was intrigued, and in working through a solution, came to learn a lot about url patterns and the underlying mechanisms Django provides to create custom fields. I'll lay out the use cases to be satisfied, the steps taken in building the architecture, and the decisions behind those steps.

Requirements

URL Pattern Choice

A user of this field will need to choose from a set of all URL patterns known by the site. Django's TypedChoiceField seems like a good candidate for this, as those patterns can be presented as a set of choices, each of which represents an instance of the same type (a url pattern), and may need to use the field's coerce method to normalize the input to that type.

Variable URL Patterns

Django's url routing offers lots of versatility. The URLs that my field needs to find could and should come in all forms. In addition to inflexible, hard-coded URLs, ones defined in a looser manner with regular expressions should be usable as well. Take these sample url patterns:

1
2
3
4
5
6
urlpatterns = [
    url(r'^$', home_view),
    url(r'^profile/$', profile_view),
    url(r'^(?P<first_group>\w+)/(?P<second_group>\d+)/$', named_groups_view),
    url(r'^(.+)/(.+)?/$', non_named_groups_view),
]

The first URL is found at /, the absence of a pattern.

The second at /profile/, a static identifier that does not change.

The third is decicedly more complicated. It encompasses a set of URLs that satisify a pattern containing two regular expression capture groups. Its first and second groups capture on word characters and digits, respectively. Among many others, the URL /accounts/1/ resolves to this view.

The fourth is similar to the third, except that its groups can capture any characters. A near-infinite number of URLs such as /a/b/, /hello/world/, /123/xyz/ will resolve to this view.

These last two patterns complicate things significantly. We will need to present an indeterminite number of additional fields to map any and all of a given url pattern's regex capture groups to a user's inputs.

Looking through Django's fields, RegexField jumps out as the field that could satisfy our need to collect a value that matches the pattern inside a capture group. But what about url patterns like those above that contain multiple capture groups? Django's MultiValueField seems to be made for our case. It could compose multiple RegexFields into a single field, validating that all capture groups are given suitable user input.

Given these two requirements, it's looking like our job is to somehow combine the URL pattern TypedChoiceField with the RegexFields that comprise our proposed MultiValueField into a single field. Sounds complicated, but worth the endeavor!

URLPatternField

Let's start with that TypedChoiceField, the mechanism for choosing a URL pattern.

URL patterns are typically defined in a module named urls immediately inside of an app. A requisite to run a Django project is the inclusion of a ROOT_URLCONF value inside the project's settings module, something like "myproject.urls". We'll take a urlconf such as this and transform it into our field's choices.

To do this, I took inspiration from the show_urls management command inside the django-extensions package. The command's extract_views_from_urlpatterns method loops through the urlconf's urlpatterns. If the pattern is a RegexURLPattern e.g. url(r'profile/^$', profile_view), the pattern's view and a simplified version of its regex, /profile/ is added to the returned list. If the pattern is a RegexURLResolver, e.g. url(r'^accounts/', include('accounts.urls')), the method recurses with that resolver's patterns, e.g. url(r'^login/$', login_view), url(...) and appends the resulting simplified regex, prepended with the resolver's regex (/accounts/login/, ...) to the returned list.

I'll define a class that is to be instantiated with the name of a urlconf, such as the value of settings.ROOT_URLCONF. It uses Django's urls.resolvers.get_resolver function to return a RegexURLResolver object we can iterate through. The populate method is like the show_urls command's extract_views_from_urlpatterns method above.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
import types
from collections import OrderedDict
from importlib import import_module

from django.urls.resolvers import get_resolver, RegexURLPattern, RegexURLResolver
from django.utils.functional import lazy

class URLPatterns(object):
    cache = {}

    def __new__(cls, urlconf):
        if urlconf in cls.cache:
            return cls.cache[urlconf]
        return super(URLPatterns, cls).__new__(cls)

    def __init__(self, urlconf):
        if self in self.cache.values():
            return
        self.urlconf = urlconf
        self.patterns = self.get_patterns()
        self.cache[urlconf] = self

    def __getitem__(self, key):
        if key in self.patterns:
            return self.patterns[key]
        raise KeyError(key)

    def __iter__(self):
        return iter(self.patterns)

    def items(self):
        return self.patterns.items()

    def _get_patterns(self):
        resolver = get_resolver(self.urlconf)
        return self.populate(resolver)
    get_patterns = lazy(_get_patterns, OrderedDict)

    def populate(self, resolver, base='', namespace=None):
        result = OrderedDict()
        for obj in resolver.url_patterns:
            pattern = obj.regex.pattern
            if base:
                pattern = base + pattern
            if isinstance(obj, RegexURLPattern):
                if not obj.name:
                    name = obj.lookup_str
                    pkg, viewname = name.rsplit('.', 1)
                    try:
                        module = import_module(pkg)
                        view = getattr(module, viewname)
                    except (ImportError, AttributeError):
                        continue
                    if not isinstance(view, types.FunctionType):
                        continue
                elif namespace:
                    name = '%s:%s' % (namespace, obj.name)
                else:
                    name = obj.name
                result[name] = pattern
            elif isinstance(obj, RegexURLResolver):
                if namespace and obj.namespace:
                    ns = '%s:%s' % (namespace, obj.namespace)
                else:
                    ns = obj.namespace or namespace
                result.update(self.populate(obj, pattern, ns))
        return result

Note that for each RegexURLPattern encountered, the pattern is mapped to a view name inside the result dictionary. That view name could be one of these three things:

  • the view's actual name, as used in the url(), e.g. url(r'^$', login_view, name='login'), "login" in this case,
  • the view's name, as above, with its associated namespace, e.g. "accounts:login",
  • a string representing the actual view function's module path, e.g. "accounts.views.login_view"

The reason for this mapping will become clear when the time comes to actually resolve URLs from our field.

Included are a few optimizations:

For one, urlconfs as a rule typically do not change over the lifetime of a site's deployment. As such, we shouldn't have to waste the effort getting patterns from a urlconf every time a URLPatterns object is created with the same one. The __new__ method short-circuits object creation and returns the already-created object mapped to a urlconf via the cache class property if the object has already been created (note the last line of the __init__ method).

Another optimization is the lazy evaluation of the get_patterns method. populate can potentially be a computationally-expensive operation depending on the number of url patterns in the urlconf. We can use Django's utils.functioanl.lazy to ensure get_patterns is only actually executed when its results need to be evaluated.

Note the URLPatterns object can be iterated over, and have members looked up and itemized by passing these responsibilities down to its patterns attribute, itself an OrderedDict instance.

Let's try it out with a sample urlconf:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
# inside demo.urls.py

demo_patterns = ([
    url(r'^$', DemoView.as_view(), name='demo'),
    url(r'^(?P<first_group>\w+)/(?P<second_group>\d+)/$', named_groups_view, name='named-groups'),
    url(r'^(.+)/(.+)?/$', non_named_groups_view, name='non-named-groups'),
], 'demo')

def demo_view(request):
    return HttpResponse("demo")

urlpatterns = [
    url(r'^demo-function/$', demo_view),
    url(r'^demo/', include(demo_patterns)),
]

In a shell,

1
2
3
4
5
6
>> url_patterns = URLPatterns(urlconf='demo.urls')
>> print([item for item in url_patterns.items()])
[('demo.urls.demo_view', r'^demo-function/$'),
 ('demo:demo', r'^demo/^$'),
 ('demo:named-groups', r'^demo/^(?P<first_group>\w+)/(?P<second_group>\d+)/$'),
 ('demo:non-named-groups', r'^demo/^(.+)/(.+)?/$')]

We're ready to use this URLPatterns object to seed the choices for our proposed field. Let's define a subclass of TypedChoiceField that takes a urlconf argument:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
import types
from django.forms.fields import TypedChoiceField

class URLPatternField(TypedChoiceField):

    def __init__(self, urlconf=None, *args, **kwargs):
        required = kwargs.get('required', True)

        if urlconf is None:
            from django.conf import settings
            urlconf = settings.ROOT_URLCONF
        elif isinstance(urlconf, types.ModuleType):
            urlconf = urlconf.__name__

        url_patterns = URLPatterns(urlconf)
        
        # Choices need to be retrieved from a callable otherwise an exception could be raised on
        # app initialization when iterating over urlconf's patterns.
        def choices():
            choices = []
            if not required:
                choices.append(('', "---------"))
            return choices + url_patterns.items()

        kwargs['choices'] = choices

        super(URLPatternField, self).__init__(*args, **kwargs)

If the urlconf argument is not given, the field will use the settings' ROOT_URLCONF. For convenience and an alternate use-case, the field could also accept the urlconf as an actual module. The isinstance(urlconf, types.ModuleType) condition takes the name of the module as a string and uses that instead.

Note the choices function defined inside the __init__ method here. We're defining choices as a callable, even when we could seemingly just assign choices to url_patterns.items() as we are doing inside the inner function. If we were to do this, however, an ImproperlyConfigured exception would be raised because we would be iterating over the urlconf before the application was loaded, causing a circular import.

Inside that choices callable, we also prepend an empty choice to the items, provided the field being initialized is not required.

Finally, once we use this field inside a form, we can see the outcome of our labor, a select tag is rendered with options whose values are string representations of a urlconf's view names, and whose labels are the regular expressions used to resolve these views.

1
2
3
4
class DemoForm(Form):
    url = URLPatternField()
    urlconf_url = URLPatternField('demo.urls')
    optional_url = URLPatternField(required=False)

This is pretty cool, however we're not nearly done yet. For one, remember our second requirement, that each regex capture group in a pattern must present the user field to map input to? This field does nothing of the sort. Any variable URL pattern we use can not resolve a URL since there is not enough information to do so.

Additionally, presenting a weird-looking, regex pattern to the user as an option's label is not-so-good. It's not really that readable to anyone, let alone a layperson user.

In the upcoming Part Two, I'll get my hands dirty building a means of gathering user input to use in those capture groups. In doing so, we'll be introduced to Django's MultiValueField and MultiWidget and the possibilities and complications they bring along.

Return to Articles & Guides