Data Catalyst: Solr Caching Explained

The magic is in the Stats
You can access the statistics on your solr instance by going to http://localhost:8983/solr/admin/stats.jsp. The admin console in solr has ton of useful info that can be used to increase performance.

Types of Performance Improvements
You can gain improvement on the avgTimePerRequest on select queries - which is available on the stats page (You need to know what requesthandler you want to optimize, I look at standard for out-of-the-box config). You can improve performance in terms of time in which the recently committed transaction shows in the index by changing the autowarmcount (this is usually small anyway).

1. avgTimePerRequest
To reduce avgTimePerRequest you need to increase your cache. But beware setting the cache too high can cause performance degradation. This is because solr has to do a lot of cleanup/etc when it starts new searchers (more on this later)The objective here is simple you need to reduce the number of evictions and keep the hit ratio high. And you need to do this for every type of cache solr has.

Solr 4.x has 4 caches out of the box:
a. queryResultCache
b. fieldCache
c. documentCache
d. fieldValueCache
click here to learn more on solr cache. The query result cache is most lightweight i.e. having a high number does not eat too much RAM. documentCache is memory intensive depending on the number of fields you index per document.

2. autowarmCount
The autowarmCount property for a cache can be set in solrconfig.xml. It is the number of cache entries that are handed over from a dying searcher to a new searcher. Every time you commit to the index, Solr starts a new searcher and copies over the cache from the previous searcher that handles request while the copying takes place. This is called autowarming. You can autowarm firstSearcher which is the searcher that gets started when Solr is started. I'd recommend setting a list of queries to pre-populate your firstSearcher cache so that you are warm enough when Solr starts. This can be done in solrconfig.

The larger your autowarmCount, the larger will be the warmupTime and hence a larger delay between a commit and the time the recently committed change is available. If the index is getting committed very frequently i.e. every few seconds, it is better to keep the autowarmCount low. Otherwise the searchers will run on top of the other giving you a performance warning in the logs.

Final Words
To get the most optimum combination of cache settings try using different combinations and see how the QTime changes. I recommend doing load tests by hitting the index with random queries and monitoring the response time. A quick and dirty way to do it is:
I created a simple batch file which uses wget to query the solr instance for a list of random words. To generate the list of random wget statements I used a simple excel formula. I then compare the log files before and after changing my settings using Notepad ++ (to grep or search the lines that contain QTime) and WinMerge (To compare the results of grep)

wget "http://1ocalhost/solr/select?q=your random word&fl=id" -O ->> log.txt

Data Catalyst

Tuesday, February 15, 2011

Solr Caching Explained

No comments:

Post a Comment