Partial Word Search With Postgres Full Text Search in Django
Posted by brad on Sept. 6, 2017, 10:19 a.m.
Recently we ran into an issue with our search interface for a project we're working on here at Fusionbox: the service backing our Django Haystack implementation experienced a momentary hiccup and all of a sudden our search index was out of sync with the state of our database. Luckily this project is still in the development phase and has not yet been released to users, but before we figured out that solr had experienced downtime, we were seeing some strange bugs in our application.
In our experience this is not an uncommon situation. It's the classical cautionary tale of distributed systems. Sure, it's possible to attempt to get around this issue by increasing system monitoring; by using cron jobs to ensure the search index is always up to date; by adding more search backend service instances. But these solutions will always eventually come up short, because you cannot depend on one database to always remain in sync with another, separate database.
Enter Django's Postgres-backed Full Text Search feature. I won't go into the details here (the docs do a pretty good job on that front), but suffice it to say, this is a great solution if you are worried about any kind of discrepancy between your application and search databases. Now, instead of relying on keeping another service's database in sync with your own application database, you can just use your own Postgres database to search over your models' text fields.
In light of our recent mishap, we decided to replace our solr-backed search with Postgres's full text search feature. Going forward we sought to avoid having solr downtime negatively influence our end users' experience. Unfortunately for us we could not leverage Django's built-in support for that feature because our search UI allows users to search using partial words. While this feature is easily implemented using Haystack and solr (simply use
EdgeNGramField fields in your Haystack search index definition(s)), Django's support for Postgres's full text search does not support this functionality.
In order to use Postgres full text search while keeping our coveted partial word search feature, we had to get creative. Luckily this was not a very difficult task; we simply had to code by hand
the Postgres full-text search query we wanted given a user's typed query. So, given a user query
like "fall tas", we wanted to return all objects whose relevant text started with "fall tas" (like, say, "fall tasks"). To do this, we parsed the query using a regex to separate the two words "fall" and "tas", placed an ampersand between the two words, then appended onto the query the special character combination ":*". The final search query string is then "fall & tas:*", which is the type of string Postgres expects when you want to perform a full text search (see this link for more details). Of course once you've created this
tsquery string, it's important to create a
tsvector with all the text you want your
tsquery to search over. This is achievable by creating a
tsvector that contains all the model text you want to search over and ANDing it with your newly-created
tsquery using Django's
extra feature. The result of ANDing the
tsvector with your
tsquery will be such that "true" will mean your
tsquery matched the text in your
tsvector and "false" will obviously mean the opposite.
Finally, you simply have to place this custom filtering code wherever it makes sense for your application. In our case, we use Django Rest Framework views and serializers to provide our React frontend with JSON data, which it then renders. As such, we place this custom partial word search filtering in our serializer's
filter method, which is responsible for filtering the serializer's queryset. Here's a basic template for what our partial word search filter looks like.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
from rest_framework import serializers class MyModelSerializer(serializers.Serializer) def filter(self, qs): <other custom queryset filtering> # Where 'search_query' is something like, "toda", a partial word query = process_query(search_query) # 'query' is now "toda:*" qs = qs.extra( where=[ ''' to_tsvector('english', unaccent(concat_ws(' ', <model_table>.<model column_1>, ..., <model_table>.<model_column_k> ))) @@ to_tsquery('english', unaccent(%s)) ''' ], params=[query], ) <more custom queryset filtering> return qs
1 2 3 4 5 6 7 8 9 10 11 12 13
def process_query(s): """ Converts the user's search string into something suitable for passing to to_tsquery. """ query = re.sub(r'[!\'()|&]', ' ', s).strip() if query: query = re.sub(r'\s+', ' & ', query) # Support prefix search on the last word. A tsquery of 'toda:*' will # match against any words that start with 'toda', which is good for # search-as-you-type. query += ':*' return query
We hope this has convinced you that you need not tie yourself to a rickety search infrastructure simply to keep easily-implemented features like partial word search. Although Django's full text search feature does not allow for partial word search, it is easy to see from the code above that it is not too much of a heavy lift to implement the feature yourself. If you're looking to go down this route but think our short solution above lacks features you want, we think this library looks promising.