Here's a list of links for hadoop / hive tuning techniques. This list was compiled by @OngEmil
- http://gbif.blogspot.com/2011/01/setting-up-hadoop-cluster-part-1-manual.html – This one is the most interesting. It breaks down reducer logs line-by-line and shows how to optimize based on them using client-settable options.
- http://blog.cloudera.com/blog/2009/12/7-tips-for-improving-mapreduce-performance/ - A grab bag of tips. Many of these are usable entirely at client-side.
- http://blog.cloudera.com/blog/2009/03/configuration-parameters-what-can-you-just-ignore/ - The internal settings are similar to the first link, but a good follow-on discussion.
- http://www.slideshare.net/cloudera/mr-perf – A nice slideshare with actual settings and explanations