Tuesday, April 19, 2011

Problems with Solr 1.4.1 Highlighting Query - Running Slow

I noticed that queries were running really slow on our 1.4.1 Solr instance which we use as Drupal backend for search. Some queries would take as high as 20 seconds!

So I started taking off parameters from the slow queries one by one until I saw a noticeable difference in query time. I noticed removing the hit highlighting was doing the trick. After a lot of digging around on the internet I found this article: http://www.mail-archive.com/solr-user@lucene.apache.org/msg28731.html

The problem is the algorithm 1.4 uses for hit highlighting. It is particularly exacerbated when the field you're trying to hit highlight on has large amounts of data. You can work around this issue by creating a copy of the field and restrict the number of characters to 20,000 so that you get a 40K odd field over which Solr will hit highlight. The performance of this new trimmed field for the purposes of the algorithm will be fine. As soon as I made the change by adding a highlight specific field the query performance improved from 20 something seconds to less than a second!

This doesn't seem to be a problem in 4.x Solr. We have a 4.x in production and it seems to be working fine even for large fields.

4 comments:

  1. What's the performance impact with highlighting on versus highlighting off on your 4.x solr instance?

    can you also give me an example of your solr schema fields (qf and hi.fl)

    thanks

    ReplyDelete
  2. The performance in general has been good on our 4.x instance. We some large pdf documents that are indexed and the hit highlight does work fine for them. Our avg time per request is well under 2 seconds for these full text queries. Which is not bad considering we have merely 1GB of memory allocated to the instance.

    ReplyDelete
  3. Can you please tell us how to create a copy of the field and restrict the number of characters to 20k?
    As i am new to solr.

    ReplyDelete
  4. You can do the following in your Schema.xml file to copy the field in to another field and limit the characters




    Please refer to http://wiki.apache.org/solr/SchemaXml for more details.

    ReplyDelete