boosting generic queries with Django Haystack / SOLR

More Django/Haystack/SOLR fun today: boosting a specific field on a generic query. Now, that would seem trivial using the "boost" option on a searchfield, but it isn't. It turns out this option only works if you explicitly query on the boosted field. 

Consider the following definition:

class SomeIndex(SearchIndex):
    text = CharField(document=True, use_template=True, boost=.5)
    title = CharField(model_attr='title', boost=1.5)

If I have two records, one with (title="Foo", text="Foo Foo Bar") and the other with (title="Bar", text="Foo Foo Foo") and I have the query q=Foo then I want the first record to be ranked the highest since it contains "Foo" in the title.

But that doesn't appear to be the way Haystack and SOLR work (out of the box). It would have worked if I had queried title=Foo, but then the second record wouldn't have matched anyway.

The solution turned out to be to use the "eDisMax" search plugin. To do this, replace your "search" searchHandler(s) in your solrconfig.xml with something along the lines of:

<requestHandler name="search" class="solr.SearchHandler" default="true">
 <lst name="defaults">
   <str name="echoParams">explicit</str>
   <int name="rows">500</int>
   <str name="spellcheck.onlyMorePopular">false</str>
   <str name="spellcheck.extendedResults">false</str>
   <str name="spellcheck.count">1</str>
   <str name="defType">edismax</str>
   <str name="q.alt">*:*</str>
   <str name="qf">text^1 description^1 title^1</str>
   <str name="pf">text^1 description^1 title^1</str>
   <str name="fl">*,score</str>
   <str name="mm">100%</str>
   <int name="ps">100</int>
<arr name="last-components">

The qf and pf tags contain the boosting, but I've set them all to "1" because it appears that Haystack's boosting does work with this configuration. This means you can chose:

  1. Configure boosting in solrconfig.xml and don't specify a boost in your SearchIndex
  2. Keep a generic boosting in solrconfig.xml and specify the actual boost in your SearchIndex

As you've guessed I've chosen the latter. It appears that neither case requires re-indexing when changing the boost.

Last updated April 18, 2013, 4:35 p.m. | filed under python, solr, django | django solr edismax haystack boost
comments powered by Disqus