<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:blogger='http://schemas.google.com/blogger/2008' xmlns:georss='http://www.georss.org/georss' xmlns:gd="http://schemas.google.com/g/2005" xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-2720443514548354607</id><updated>2024-11-11T01:09:49.849-08:00</updated><category term="mysql"/><category term="Solr"/><category term="Talend"/><category term="R"/><category term="ETL"/><category term="hadoop"/><category term="ruby"/><category term="Git"/><category term="Java"/><category term="Regex"/><category term="google geocode geocoder address latlon talend"/><category term="javascript"/><category term="jquery"/><category term="python"/><category term="Gantter"/><category term="General Tech"/><category term="INFORMATION_SCHEMA information schema sql server objects data type"/><category term="Invalid Date"/><category term="NaN"/><category term="Project Management"/><category term="Squirrel"/><category term="Talend Solr ETL SolrJ real-time"/><category term="Twiiter"/><category term="VPS"/><category term="Zenoss sql server mssql alert monitoring"/><category term="cheat sheet"/><category term="cloudera"/><category term="copy static dimension data dev test prod"/><category term="create table as SQL Server"/><category term="data integration"/><category term="date"/><category term="digital ocean"/><category term="encryption"/><category term="encryptor"/><category term="files"/><category term="find"/><category term="ggplot2"/><category term="github"/><category term="hive"/><category term="html"/><category term="html decode encode string ssis vb cleanse"/><category term="http"/><category term="instant-search"/><category term="iso"/><category term="jenkins"/><category term="json"/><category term="libxml"/><category term="liquibase"/><category term="monitoring"/><category term="nosql solr"/><category term="object"/><category term="open source"/><category term="payload"/><category term="profiling"/><category term="proxy server"/><category term="regression testing type1 dimension slowly changing dimension"/><category term="sample data generation easy siimple name addresses data"/><category term="server"/><category term="shortcuts"/><category term="simple http server"/><category term="solr caching cache details performance improve queryrequesthandler autowarm firstsearcher avgTimePerRequest solr wget"/><category term="solr jetty windows service"/><category term="sql server table level permission securable granular object"/><category term="sqoop"/><category term="test"/><category term="tjava"/><category term="toString"/><category term="tweet twitter talend download warehouse"/><category term="unix"/><category term="web 2.0"/><category term="wget twitter"/><category term="xml"/><category term="yaml"/><title type='text'>Data Catalyst</title><subtitle type='html'>Yash Ranadive&#39;s Blog</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://datacatalyst.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2720443514548354607/posts/default'/><link rel='alternate' type='text/html' href='http://datacatalyst.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><link rel='next' type='application/atom+xml' href='http://www.blogger.com/feeds/2720443514548354607/posts/default?start-index=26&amp;max-results=25'/><author><name>Anonymous</name><uri>http://www.blogger.com/profile/07756830537136925187</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>113</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>25</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-2720443514548354607.post-5503777881519289086</id><published>2014-10-10T00:00:00.003-07:00</published><updated>2014-10-10T00:00:34.148-07:00</updated><title type='text'>Convert ISO-8601 time to UTC time in Hive</title><content type='html'>&lt;div dir=&quot;ltr&quot; style=&quot;text-align: left;&quot; trbidi=&quot;on&quot;&gt;
In order to convert the ISO-8601 datetime string .e.g &quot;2013-06-10T12:31:00+0700&quot; in to UTC time &quot;2013-06-10T05:31:00Z&quot; you can do the following&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
select from_unixtime(iso8601_to_unix_timestamp(&#39;2013-06-10T12:31:00Z&#39;), &#39;yyyy-MM-dd-HH-mm-ss&#39;) from table limit 1;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For this to work you will need the &lt;a href=&quot;https://github.com/simplymeasured/hive-udf&quot; target=&quot;_blank&quot;&gt;simply measured&#39;s hive udf&lt;/a&gt;&amp;nbsp;and you will need to add the following jars:&lt;br /&gt;
&lt;br /&gt;
hive&amp;gt;&amp;nbsp;ADD JAR hdfs:///external-jars/commons-codec-1.9.jar;&lt;br /&gt;
hive&amp;gt;&amp;nbsp;ADD JAR hdfs:///external-jars/joda-time-2.2.jar;&lt;br /&gt;
hive&amp;gt;&amp;nbsp;ADD JAR hdfs:///external-jars/sm-hive-udf-1.0-SNAPSHOT.jar;&lt;br /&gt;
&lt;br /&gt;
hive&amp;gt;select from_unixtime(iso8601_to_unix_timestamp(&#39;2013-06-10T12:31:00Z&#39;), &#39;yyyy-MM-dd-HH-mm-ss&#39;) from table limit 1;&lt;br /&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
</content><link rel='replies' type='application/atom+xml' href='http://datacatalyst.blogspot.com/feeds/5503777881519289086/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://datacatalyst.blogspot.com/2014/10/convert-iso-8601-time-to-utc-time-in.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2720443514548354607/posts/default/5503777881519289086'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2720443514548354607/posts/default/5503777881519289086'/><link rel='alternate' type='text/html' href='http://datacatalyst.blogspot.com/2014/10/convert-iso-8601-time-to-utc-time-in.html' title='Convert ISO-8601 time to UTC time in Hive'/><author><name>Anonymous</name><uri>http://www.blogger.com/profile/07756830537136925187</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2720443514548354607.post-8897854691567056468</id><published>2014-02-25T13:35:00.000-08:00</published><updated>2014-02-25T13:35:05.697-08:00</updated><title type='text'>Ruby Read Large files from the Network and write to File</title><content type='html'>&lt;div dir=&quot;ltr&quot; style=&quot;text-align: left;&quot; trbidi=&quot;on&quot;&gt;
If you are writing a large file to disk using the traditional way (open-uri), you will notice that the memory usage spikes up just before writing the file to disk.&lt;br /&gt;
&lt;br /&gt;
The workaround to this is to use the HTTP.start method and write chunks at a time to disk as they are received.&lt;br /&gt;
&lt;br /&gt;
Net::HTTP.start(end_point, { :use_ssl =&amp;gt;true }) do |http|&lt;br /&gt;
&amp;nbsp; &amp;nbsp;http.request_get(resource) do |response|&lt;br /&gt;
&amp;nbsp; &amp;nbsp; open filename(date), &#39;w&#39; do |io|&lt;br /&gt;
&amp;nbsp; &amp;nbsp; &amp;nbsp; response.read_body do |chunk|&lt;br /&gt;
&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; io.write chunk&lt;br /&gt;
&amp;nbsp; &amp;nbsp; &amp;nbsp; end&lt;br /&gt;
&amp;nbsp; &amp;nbsp; end&lt;br /&gt;
&amp;nbsp; end&lt;br /&gt;
end&lt;br /&gt;
&lt;br /&gt;&lt;/div&gt;
</content><link rel='replies' type='application/atom+xml' href='http://datacatalyst.blogspot.com/feeds/8897854691567056468/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://datacatalyst.blogspot.com/2014/02/ruby-read-large-files-from-network-and.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2720443514548354607/posts/default/8897854691567056468'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2720443514548354607/posts/default/8897854691567056468'/><link rel='alternate' type='text/html' href='http://datacatalyst.blogspot.com/2014/02/ruby-read-large-files-from-network-and.html' title='Ruby Read Large files from the Network and write to File'/><author><name>Anonymous</name><uri>http://www.blogger.com/profile/07756830537136925187</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2720443514548354607.post-3407641673220186366</id><published>2014-02-18T22:48:00.002-08:00</published><updated>2014-02-18T22:48:36.361-08:00</updated><title type='text'>Svbtle</title><content type='html'>&lt;div dir=&quot;ltr&quot; style=&quot;text-align: left;&quot; trbidi=&quot;on&quot;&gt;
I&#39;m looking for a good hosted Mardown blogging solution. I&#39;ve written my first blog using Svbtle at&amp;nbsp;http://dubwubwub.svbtle.com/shell-yesterdays-dats&lt;br /&gt;
&lt;br /&gt;
It was fairly easy to write. But the Svbtle interface requires getting used to. Overall it looks good. Looking forward to using more of it and documenting my experiences.&lt;/div&gt;
</content><link rel='replies' type='application/atom+xml' href='http://datacatalyst.blogspot.com/feeds/3407641673220186366/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://datacatalyst.blogspot.com/2014/02/svbtle.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2720443514548354607/posts/default/3407641673220186366'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2720443514548354607/posts/default/3407641673220186366'/><link rel='alternate' type='text/html' href='http://datacatalyst.blogspot.com/2014/02/svbtle.html' title='Svbtle'/><author><name>Anonymous</name><uri>http://www.blogger.com/profile/07756830537136925187</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2720443514548354607.post-8012935847401001374</id><published>2014-02-13T22:38:00.000-08:00</published><updated>2014-02-13T22:38:12.881-08:00</updated><title type='text'>Don&#39;t put you passwords in the commandline</title><content type='html'>&lt;div dir=&quot;ltr&quot; style=&quot;text-align: left;&quot; trbidi=&quot;on&quot;&gt;
Consider this shell command&lt;br /&gt;
&lt;br /&gt;
$&amp;gt;mysql -u username -p password&lt;br /&gt;
&lt;br /&gt;
Passwords on the command line are a real BAD idea. Here&#39;s why:&lt;br /&gt;
&lt;br /&gt;
1. They are easily viewable in the process-list by doing a&amp;nbsp;&lt;i&gt;ps&lt;/i&gt;&lt;br /&gt;
2. They are easily viewable in the command history by doing &lt;i&gt;history&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
Remember, don&#39;t enter your passwords in version control systems like Git. Git servers like github are often published to a wider audience within an organization. Always use external configuration files or a configuration framework such as Configatron to deploy password/username/keys/etc.&lt;/div&gt;
</content><link rel='replies' type='application/atom+xml' href='http://datacatalyst.blogspot.com/feeds/8012935847401001374/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://datacatalyst.blogspot.com/2014/02/dont-put-you-passwords-in-commandline.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2720443514548354607/posts/default/8012935847401001374'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2720443514548354607/posts/default/8012935847401001374'/><link rel='alternate' type='text/html' href='http://datacatalyst.blogspot.com/2014/02/dont-put-you-passwords-in-commandline.html' title='Don&#39;t put you passwords in the commandline'/><author><name>Anonymous</name><uri>http://www.blogger.com/profile/07756830537136925187</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2720443514548354607.post-5355908547809082570</id><published>2014-02-11T20:26:00.003-08:00</published><updated>2014-02-11T20:26:59.443-08:00</updated><title type='text'>Code Kata, Simple implementation of Bloom Filters</title><content type='html'>&lt;div dir=&quot;ltr&quot; style=&quot;text-align: left;&quot; trbidi=&quot;on&quot;&gt;
We are doing &lt;a href=&quot;http://en.wikipedia.org/wiki/Kata_(programming)&quot; target=&quot;_blank&quot;&gt;kata&lt;/a&gt; at work this month. It is pretty exciting as you spend around 30 mins everyday learning a new technique or stretching your coding abilities. I tried the &lt;a href=&quot;http://codekata.com/kata/kata05-bloom-filters/&quot; target=&quot;_blank&quot;&gt;code kata&lt;/a&gt; website for some fun exercises.&lt;br /&gt;
&lt;br /&gt;
One thing that I&#39;ve been particularly interested in the last year or so is &lt;a href=&quot;http://en.wikipedia.org/wiki/Bloom_filter&quot; target=&quot;_blank&quot;&gt;Bloom Filters&lt;/a&gt;. Joins are so expensive! Bloom filters are simply amazing. They help you find if a value is NOT in a particular.&lt;br /&gt;
&lt;br /&gt;
Here&#39;s my implementation of Bloom Filters in Ruby. It is not perfect and can use a &lt;a href=&quot;https://github.com/tyler/bitset&quot; target=&quot;_blank&quot;&gt;BitSet&lt;/a&gt; ruby implementation to save on some memory. Also, not tested very thoroughly but you get the idea.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
# http://codekata.com/kata/kata05-bloom-filters/&lt;br /&gt;
&lt;br /&gt;
&amp;nbsp; class BloomFilter&lt;br /&gt;
&amp;nbsp; &amp;nbsp; def initialize(bitmap_size)&lt;br /&gt;
&amp;nbsp; &amp;nbsp; &amp;nbsp; @bitmap = Array.new&lt;br /&gt;
&amp;nbsp; &amp;nbsp; &amp;nbsp; @bitmap_max_size = bitmap_size&lt;br /&gt;
&amp;nbsp; &amp;nbsp; end&lt;br /&gt;
&lt;br /&gt;
&amp;nbsp; &amp;nbsp; def hash_function_1(some_object)&lt;br /&gt;
&amp;nbsp; &amp;nbsp; &amp;nbsp; raw_val = some_object.inspect.each_byte.inject do |sum,c|&lt;br /&gt;
&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; sum += c&lt;br /&gt;
&amp;nbsp; &amp;nbsp; &amp;nbsp; end&lt;br /&gt;
&lt;br /&gt;
&amp;nbsp; &amp;nbsp; &amp;nbsp; raw_val % @bitmap_max_size&lt;br /&gt;
&amp;nbsp; &amp;nbsp; end&lt;br /&gt;
&lt;br /&gt;
&amp;nbsp; &amp;nbsp; def hash_function_2(some_object)&lt;br /&gt;
&amp;nbsp; &amp;nbsp; &amp;nbsp; raw_val = some_object.inspect.each_byte.inject do |sum,c|&lt;br /&gt;
&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; sum += c&lt;br /&gt;
&amp;nbsp; &amp;nbsp; &amp;nbsp; end&lt;br /&gt;
&lt;br /&gt;
&amp;nbsp; &amp;nbsp; &amp;nbsp; (raw_val + raw_val.to_s.length**3) % @bitmap_max_size&lt;br /&gt;
&amp;nbsp; &amp;nbsp; end&lt;br /&gt;
&lt;br /&gt;
&amp;nbsp; &amp;nbsp; def hash_function_3(some_object)&lt;br /&gt;
&amp;nbsp; &amp;nbsp; &amp;nbsp; raw_val = some_object.inspect.each_byte.inject do |sum,c|&lt;br /&gt;
&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; sum += c&lt;br /&gt;
&amp;nbsp; &amp;nbsp; &amp;nbsp; end&lt;br /&gt;
&lt;br /&gt;
&amp;nbsp; &amp;nbsp; &amp;nbsp; (raw_val + raw_val.to_s.split().last.to_i**8) % @bitmap_max_size&lt;br /&gt;
&amp;nbsp; &amp;nbsp; end&lt;br /&gt;
&lt;br /&gt;
&amp;nbsp; &amp;nbsp; def put(put_object)&lt;br /&gt;
&amp;nbsp; &amp;nbsp; &amp;nbsp; @bitmap[hash_function_1(put_object)] = 1&lt;br /&gt;
&amp;nbsp; &amp;nbsp; &amp;nbsp; @bitmap[hash_function_2(put_object)] = 1&lt;br /&gt;
&amp;nbsp; &amp;nbsp; &amp;nbsp; @bitmap[hash_function_3(put_object)] = 1&lt;br /&gt;
&amp;nbsp; &amp;nbsp; end&lt;br /&gt;
&lt;br /&gt;
&amp;nbsp; &amp;nbsp; def exists(put_object)&lt;br /&gt;
&amp;nbsp; &amp;nbsp; &amp;nbsp; @bitmap[hash_function_1(put_object)] == 1 &amp;amp;&amp;amp;&lt;br /&gt;
&amp;nbsp; &amp;nbsp; &amp;nbsp; @bitmap[hash_function_2(put_object)] == 1 &amp;amp;&amp;amp;&lt;br /&gt;
&amp;nbsp; &amp;nbsp; &amp;nbsp; @bitmap[hash_function_3(put_object)] == 1&lt;br /&gt;
&amp;nbsp; &amp;nbsp; end&lt;br /&gt;
&amp;nbsp; end&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
a = BloomFilter.new(1000)&lt;br /&gt;
(1..1000).each {|x| a.put(&quot;test#{x}&quot;)}&lt;br /&gt;
(1..1000).each {|x| puts &quot;#{x}&quot; unless a.exists(&quot;test#{x}&quot;)}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
This was quickly hacked, so let me know your comments on how this can be improved.&lt;/div&gt;
</content><link rel='replies' type='application/atom+xml' href='http://datacatalyst.blogspot.com/feeds/5355908547809082570/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://datacatalyst.blogspot.com/2014/02/code-kata-simple-implementation-of.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2720443514548354607/posts/default/5355908547809082570'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2720443514548354607/posts/default/5355908547809082570'/><link rel='alternate' type='text/html' href='http://datacatalyst.blogspot.com/2014/02/code-kata-simple-implementation-of.html' title='Code Kata, Simple implementation of Bloom Filters'/><author><name>Anonymous</name><uri>http://www.blogger.com/profile/07756830537136925187</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2720443514548354607.post-592170846533445145</id><published>2013-11-06T11:46:00.001-08:00</published><updated>2013-11-06T11:46:11.355-08:00</updated><title type='text'>Presto - Facebook&#39;s Data Crunching Monster</title><content type='html'>&lt;div dir=&quot;ltr&quot; style=&quot;text-align: left;&quot; trbidi=&quot;on&quot;&gt;
I came to know about Facebook Presto for the first time few months back at the &quot;Analytics at Web Scale&quot; conference at Facebook. Today they open sourced Presto&lt;br /&gt;
&lt;a href=&quot;https://www.facebook.com/notes/facebook-engineering/presto-interacting-with-petabytes-of-data-at-facebook/10151786197628920&quot;&gt;https://www.facebook.com/notes/facebook-engineering/presto-interacting-with-petabytes-of-data-at-facebook/10151786197628920&lt;/a&gt;&lt;br /&gt;
&lt;br /&gt;
Really excited to see how this changes the big data landscape as analysts get more hungry for data and demand faster speed of query execution.&lt;/div&gt;
</content><link rel='replies' type='application/atom+xml' href='http://datacatalyst.blogspot.com/feeds/592170846533445145/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://datacatalyst.blogspot.com/2013/11/presto-facebooks-data-crunching-monster.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2720443514548354607/posts/default/592170846533445145'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2720443514548354607/posts/default/592170846533445145'/><link rel='alternate' type='text/html' href='http://datacatalyst.blogspot.com/2013/11/presto-facebooks-data-crunching-monster.html' title='Presto - Facebook&#39;s Data Crunching Monster'/><author><name>Anonymous</name><uri>http://www.blogger.com/profile/07756830537136925187</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2720443514548354607.post-4500117083824927605</id><published>2013-10-24T15:33:00.001-07:00</published><updated>2013-10-24T15:33:14.352-07:00</updated><title type='text'>Copy gems from one server to another</title><content type='html'>&lt;div dir=&quot;ltr&quot; style=&quot;text-align: left;&quot; trbidi=&quot;on&quot;&gt;
&lt;br /&gt;
First copy the gem list from the source box:&lt;div&gt;
ssh account@sourcegembox &#39;gem list&#39; &amp;gt; /tmp/gem-list&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
Now install the gems:&lt;/div&gt;
&lt;div&gt;
cat /tmp/gem-list | cut -d &quot; &quot; -f 1 | xargs sudo gem install&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
</content><link rel='replies' type='application/atom+xml' href='http://datacatalyst.blogspot.com/feeds/4500117083824927605/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://datacatalyst.blogspot.com/2013/10/copy-gems-from-one-server-to-another.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2720443514548354607/posts/default/4500117083824927605'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2720443514548354607/posts/default/4500117083824927605'/><link rel='alternate' type='text/html' href='http://datacatalyst.blogspot.com/2013/10/copy-gems-from-one-server-to-another.html' title='Copy gems from one server to another'/><author><name>Anonymous</name><uri>http://www.blogger.com/profile/07756830537136925187</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2720443514548354607.post-5008999140584482631</id><published>2013-10-16T17:06:00.001-07:00</published><updated>2013-10-16T17:06:28.785-07:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="cloudera"/><category scheme="http://www.blogger.com/atom/ns#" term="hadoop"/><category scheme="http://www.blogger.com/atom/ns#" term="sqoop"/><title type='text'>Download Sqoop 2 from cloudera using apt-get install</title><content type='html'>&lt;div dir=&quot;ltr&quot; style=&quot;text-align: left;&quot; trbidi=&quot;on&quot;&gt;
Today I had problems downloading the sqoop2 server and client using apt-get install. The problem was that apt wasn&#39;t able to get the correct package. I tried to manually set up the .list file in /etc/apt/sources.list.d directory according to&lt;br /&gt;
the cloudera link&amp;nbsp;&lt;a href=&quot;http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/4.2.1/CDH4-Installation-Guide/cdh4ig_topic_4_4.html&quot;&gt;http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/4.2.1/CDH4-Installation-Guide/cdh4ig_topic_4_4.html&lt;/a&gt;&lt;br /&gt;
&lt;br /&gt;
But that did not work either. Finally I was able to get it running by downloading &amp;nbsp;and installing the debian file for cloudera from&amp;nbsp;&lt;a href=&quot;http://archive.cloudera.com/cdh4/one-click-install/precise/amd64/cdh4-repository_1.0_all.deb&quot;&gt;http://archive.cloudera.com/cdh4/one-click-install/precise/amd64/cdh4-repository_1.0_all.deb&lt;/a&gt;. Here&#39;s the link for lucid systems&amp;nbsp;&lt;a href=&quot;http://archive.cloudera.com/cdh4/one-click-install/lucid/amd64/cdh4-repository_1.0_all.deb&quot;&gt;http://archive.cloudera.com/cdh4/one-click-install/lucid/amd64/cdh4-repository_1.0_all.deb&lt;/a&gt;.&lt;br /&gt;
&lt;br /&gt;
sudo dpkg -i cdh4-repository_1.0_all.deb&lt;br /&gt;
&lt;br /&gt;
Now try&lt;br /&gt;
&lt;br /&gt;
sudo apt-get install sqoop2-server&lt;br /&gt;
sudo apt-get install sqoop2-client&lt;/div&gt;
</content><link rel='replies' type='application/atom+xml' href='http://datacatalyst.blogspot.com/feeds/5008999140584482631/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://datacatalyst.blogspot.com/2013/10/download-sqoop-2-from-cloudera-using.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2720443514548354607/posts/default/5008999140584482631'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2720443514548354607/posts/default/5008999140584482631'/><link rel='alternate' type='text/html' href='http://datacatalyst.blogspot.com/2013/10/download-sqoop-2-from-cloudera-using.html' title='Download Sqoop 2 from cloudera using apt-get install'/><author><name>Anonymous</name><uri>http://www.blogger.com/profile/07756830537136925187</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2720443514548354607.post-9095047578011723852</id><published>2013-09-05T19:44:00.000-07:00</published><updated>2013-09-05T23:21:38.465-07:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="hadoop"/><category scheme="http://www.blogger.com/atom/ns#" term="hive"/><title type='text'>Some links for Hadoop Performance Tuning</title><content type='html'>&lt;div dir=&quot;ltr&quot; style=&quot;text-align: left;&quot; trbidi=&quot;on&quot;&gt;
Here&#39;s a list of links for hadoop / hive tuning techniques. This list was compiled by&amp;nbsp;&lt;a href=&quot;https://twitter.com/OngEmil%E2%80%8E&quot; target=&quot;_blank&quot;&gt;@OngEmil&lt;/a&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;blockquote style=&quot;text-align: -webkit-auto;&quot; type=&quot;cite&quot;&gt;
&lt;ul style=&quot;background-color: rgba(255, 255, 255, 0);&quot;&gt;
&lt;li style=&quot;text-align: left;&quot;&gt;&lt;a href=&quot;http://gbif.blogspot.com/2011/01/setting-up-hadoop-cluster-part-1-manual.html&quot;&gt;http://gbif.blogspot.com/2011/01/setting-up-hadoop-cluster-part-1-manual.html&lt;/a&gt;&amp;nbsp;– This one is the most interesting. It breaks down reducer logs line-by-line and shows how to optimize based on them using client-settable options.&lt;/li&gt;
&lt;li style=&quot;text-align: left;&quot;&gt;&lt;a href=&quot;http://blog.cloudera.com/blog/2009/12/7-tips-for-improving-mapreduce-performance/&quot;&gt;http://blog.cloudera.com/blog/2009/12/7-tips-for-improving-mapreduce-performance/&lt;/a&gt;&amp;nbsp;- A grab bag of tips. Many of these are usable entirely at client-side.&lt;/li&gt;
&lt;li style=&quot;text-align: left;&quot;&gt;&lt;a href=&quot;http://blog.cloudera.com/blog/2009/03/configuration-parameters-what-can-you-just-ignore/&quot;&gt;http://blog.cloudera.com/blog/2009/03/configuration-parameters-what-can-you-just-ignore/&lt;/a&gt;&amp;nbsp;- The internal settings are similar to the first link, but a good follow-on discussion.&lt;/li&gt;
&lt;li style=&quot;text-align: left;&quot;&gt;&lt;a href=&quot;http://www.slideshare.net/cloudera/mr-perf&quot;&gt;http://www.slideshare.net/cloudera/mr-perf&lt;/a&gt;&amp;nbsp;– A nice slideshare with actual settings and explanations&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;/div&gt;
</content><link rel='replies' type='application/atom+xml' href='http://datacatalyst.blogspot.com/feeds/9095047578011723852/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://datacatalyst.blogspot.com/2013/09/some-links-for-hadoop-performance-tuning.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2720443514548354607/posts/default/9095047578011723852'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2720443514548354607/posts/default/9095047578011723852'/><link rel='alternate' type='text/html' href='http://datacatalyst.blogspot.com/2013/09/some-links-for-hadoop-performance-tuning.html' title='Some links for Hadoop Performance Tuning'/><author><name>Anonymous</name><uri>http://www.blogger.com/profile/07756830537136925187</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2720443514548354607.post-1877260115748606745</id><published>2013-09-02T18:12:00.002-07:00</published><updated>2013-09-02T18:16:16.377-07:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="hadoop"/><category scheme="http://www.blogger.com/atom/ns#" term="monitoring"/><category scheme="http://www.blogger.com/atom/ns#" term="profiling"/><title type='text'>Hadoop monitoring and poor man&#39;s profiling</title><content type='html'>&lt;div dir=&quot;ltr&quot; style=&quot;text-align: left;&quot; trbidi=&quot;on&quot;&gt;
&lt;b&gt;Hadoop Monitoring&lt;/b&gt;&lt;br /&gt;
I created a simple script that monitors the hadoop cluster for changes in the number of nodes on the cluster. If you run it with an external tool such as Jenkins you can send error emails to yourself whenever the script exits with error code 1. You can also extend the script to do mailing if you don&#39;t want to use an external tool. Since the hadoop dfsadmin -report command fails when the namenode is down this script also alerts you when the namenode is unhappy. There are many ways to monitor your cluster such as Cloudera Manager but we decided to create our own tools for the time being. You can also extend the script to check for whatever you like on the nodes. I&#39;m checking for the tmp directory space as it often fills up the cluster when bad queries are executed on Hive.&lt;br /&gt;
&lt;br /&gt;
Here&#39;s the gist:&lt;br /&gt;
&lt;a href=&quot;https://gist.github.com/yash-ranadive/6418644&quot;&gt;https://gist.github.com/yash-ranadive/6418644&lt;/a&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;Poor Man&#39;s Profiling&lt;/b&gt;&lt;br /&gt;
There are many ways you can do profiling on your hadoop cluster to see what is causing slowness. One such technique is to take thread dumps of the java threads and see what process is running most frequently. You can do so by taking a few(10-15) thread dumps in random intervals. If you see the same methods being called over and over again - you can infer the location your app is spending most of its time.&lt;br /&gt;
&lt;br /&gt;
To find what&#39;s&lt;br /&gt;
&lt;br /&gt;
We did one such exercise when we saw our hive queries were stuck in the last reduce phase. The singular reduce is almost always the bottleneck. To take the dump -&lt;br /&gt;
1. Navigate the resource manager (YARN) and click on the query you are running.&lt;br /&gt;
2. Click on the application master link and click on Map/Reduce links&lt;br /&gt;
3. More likely than not it is the reduce phase in the map reduce generated by hive that runs slow. Click on the reduce and then get the server on which the container has been running the longest.&lt;br /&gt;
4. SSH in to the box and run the following multiple times in irregular time intervals.&lt;br /&gt;
&lt;br /&gt;
&lt;div style=&quot;text-align: center;&quot;&gt;
&lt;i&gt;killall -QUIT java&amp;nbsp;&lt;/i&gt;&lt;/div&gt;
&lt;div style=&quot;text-align: left;&quot;&gt;
&lt;i&gt;&lt;br /&gt;&lt;/i&gt;&lt;/div&gt;
&lt;div style=&quot;text-align: left;&quot;&gt;
This will dump the threads to STDOUT on the logs. You can navigate to the logs on the box on the resource manager and look at the dumps. We realized that compression method for the ZLIB compression class was most frequently appearing in the dumps. Again, this is poor man&#39;s profiling and there are better ways to profile your java process. But it did work for us and we were able to get a performance increase after switching to LZO.&lt;/div&gt;
&lt;/div&gt;
</content><link rel='replies' type='application/atom+xml' href='http://datacatalyst.blogspot.com/feeds/1877260115748606745/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://datacatalyst.blogspot.com/2013/09/hadoop-monitoring-and-poor-mans.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2720443514548354607/posts/default/1877260115748606745'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2720443514548354607/posts/default/1877260115748606745'/><link rel='alternate' type='text/html' href='http://datacatalyst.blogspot.com/2013/09/hadoop-monitoring-and-poor-mans.html' title='Hadoop monitoring and poor man&#39;s profiling'/><author><name>Anonymous</name><uri>http://www.blogger.com/profile/07756830537136925187</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2720443514548354607.post-2563207821663802682</id><published>2013-09-02T18:05:00.001-07:00</published><updated>2013-09-02T18:16:33.390-07:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="encryption"/><category scheme="http://www.blogger.com/atom/ns#" term="encryptor"/><category scheme="http://www.blogger.com/atom/ns#" term="ruby"/><category scheme="http://www.blogger.com/atom/ns#" term="yaml"/><title type='text'>How to store encrypted passwords in a YAML file Ruby</title><content type='html'>&lt;div dir=&quot;ltr&quot; style=&quot;text-align: left;&quot; trbidi=&quot;on&quot;&gt;
YAML works great for storing config information for ruby scripts.&amp;nbsp;First, if you are writing ruby scripts I highly recommend storing your authentication information in an external file. Checking in auth info is a bad idea. If for whatever reason your source control system is compromised the attacker can have access to all your user names and password. &lt;br /&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
If you&#39;ve tried to store any sort of non-UTF-8 password in YAML you know how painful it is to retrieve it. In fact I haven&#39;t found a way to retrieve it at all. YAML reads only UTF-8. &lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;
&lt;b&gt;Encryption&lt;/b&gt;&lt;/div&gt;
&lt;div&gt;
The first thing you need to make sure is that you are storing the password in the correct encoding. If you use gems like &#39;encryptor&#39; check the encoding of the encrypted password first:&lt;br /&gt;
encrypted_pass = Encryptor.encrypt(blah blah).&lt;i&gt;&lt;b&gt;encoding&lt;/b&gt;&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
If it is not UTF-8 you can force encoding by doing&lt;br /&gt;
encrypted_pass = Encryptor.encrypt(blah blah).&lt;b&gt;&lt;i&gt;force_encoding&lt;/i&gt;&lt;/b&gt;(&#39;UTF-8&#39;)&lt;br /&gt;
&lt;br /&gt;
&lt;div class=&quot;p1&quot;&gt;
I like to Base64 encode and then UTF-8 encode the password out of shuber&#39;s encryptor gem. The gem by default spits out a ASCII-BIT string.&lt;/div&gt;
&lt;div class=&quot;p1&quot;&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div class=&quot;p1&quot;&gt;
# To encrypt a fresh password to store in a file run:&lt;/div&gt;
&lt;div class=&quot;p1&quot;&gt;
Base64.encode64(Encryptor.encrypt(passwd,:key =&amp;gt; secret_key, :algorithm =&amp;gt; &#39;aes-256-ecb&#39;)).force_encoding(&#39;UTF-8&#39;)&lt;/div&gt;
&lt;br /&gt;
&lt;div class=&quot;p2&quot;&gt;
&lt;span class=&quot;s1&quot;&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;
&lt;div class=&quot;p2&quot;&gt;
&lt;span class=&quot;s1&quot;&gt;Copy and paste the above password in the YAML file.&lt;/span&gt;&lt;/div&gt;
&lt;div class=&quot;p2&quot;&gt;
&lt;span class=&quot;s1&quot;&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;
&lt;div class=&quot;p2&quot;&gt;
&lt;span class=&quot;s1&quot;&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;
&lt;div class=&quot;p2&quot;&gt;
&lt;span class=&quot;s1&quot;&gt;&lt;b&gt;Decryption&lt;/b&gt;&lt;/span&gt;&lt;/div&gt;
&lt;div class=&quot;p2&quot;&gt;
&lt;span class=&quot;s1&quot;&gt;# To decrypt the password from the YAML file&lt;/span&gt;&lt;/div&gt;
&lt;div class=&quot;p2&quot;&gt;
&lt;span class=&quot;s1&quot;&gt;account_config = YAML.load_file(&lt;/span&gt;name_of_file.yml&lt;span class=&quot;s1&quot;&gt;)&lt;/span&gt;&lt;/div&gt;
&lt;div class=&quot;p2&quot;&gt;
&lt;/div&gt;
&lt;div class=&quot;p1&quot;&gt;
account_password = Encryptor.decrypt(Base64.decode64(account_config[&lt;span class=&quot;s1&quot;&gt;&#39;password&#39;&lt;/span&gt;].force_encoding(&lt;span class=&quot;s1&quot;&gt;&#39;ASCII-8BIT&#39;&lt;/span&gt;)), :key =&amp;gt; secret_key, :algorithm =&amp;gt; &lt;span class=&quot;s1&quot;&gt;&#39;aes-256-ecb&#39;&lt;/span&gt;)&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
</content><link rel='replies' type='application/atom+xml' href='http://datacatalyst.blogspot.com/feeds/2563207821663802682/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://datacatalyst.blogspot.com/2013/09/how-to-store-encrypted-passwords-in.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2720443514548354607/posts/default/2563207821663802682'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2720443514548354607/posts/default/2563207821663802682'/><link rel='alternate' type='text/html' href='http://datacatalyst.blogspot.com/2013/09/how-to-store-encrypted-passwords-in.html' title='How to store encrypted passwords in a YAML file Ruby'/><author><name>Anonymous</name><uri>http://www.blogger.com/profile/07756830537136925187</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2720443514548354607.post-5398610973903819489</id><published>2013-09-02T17:49:00.002-07:00</published><updated>2013-09-02T18:16:45.122-07:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="cheat sheet"/><category scheme="http://www.blogger.com/atom/ns#" term="Git"/><category scheme="http://www.blogger.com/atom/ns#" term="shortcuts"/><title type='text'>Simple Cheat Sheet Search using Grep for your Mac</title><content type='html'>&lt;div dir=&quot;ltr&quot; style=&quot;text-align: left;&quot; trbidi=&quot;on&quot;&gt;
Often I find myself referring to the same list of commands for utils such as git, hadoop, mysql, ruby, tmux, etc. The usual way of doing it is to just google them and you are very likely to find what you&#39;re looking for in a few clicks. A faster way is to keep a cheat sheet of commands you use often and then use a search on that text file when you are looking for commands.&lt;br /&gt;
&lt;br /&gt;
The problem with this approach is that it is just as painful as the first one. Imagine you have 8-10 tabs open at a time and your text editor tab is somewhere in the middle. That&#39;s 4 key strokes, Now search for the text thats one more. Now scroll to find the one that you like. Too many keystorkes/mouse events and its just real frustrating.&lt;br /&gt;
&lt;br /&gt;
I created a simple hack that just greps the line you want and shows 2 lines beneath the lines it finds a match on.&lt;br /&gt;
&lt;br /&gt;
dn.sh&lt;br /&gt;
#!/bin/sh&lt;br /&gt;
read search_string&lt;br /&gt;
grep -n -i -A 2 $search_string ~/cheatsheet.txt&lt;br /&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;br /&gt;
So for example if you have your cheatsheet which has command description followed by the command usage and which looks something like this:&lt;br /&gt;
&lt;br /&gt;
. . .&lt;br /&gt;
Git Squash Commit (prefix every line to squash with s)&lt;br /&gt;
git rebase -i master&lt;br /&gt;
&lt;br /&gt;
Git show remotes&lt;br /&gt;
git remote show&lt;br /&gt;
&lt;br /&gt;
Git branches&lt;br /&gt;
https://github.com/Kunena/Kunena-2.0/wiki/Create-a-new-branch-with-git-and-manage-branches&lt;br /&gt;
&lt;br /&gt;
Git Delete Branch&lt;br /&gt;
git branch -D &lt;branch name=&quot;&quot;&gt;&lt;/branch&gt;&lt;br /&gt;
&amp;nbsp;. . .&lt;br /&gt;
&lt;br /&gt;
Then you can open spotlight with Command&amp;nbsp;+ Space, type dn, enter. Type your search string &quot;squash&quot; and you will see the following:&lt;br /&gt;
315:Git &lt;b&gt;Squash&lt;/b&gt; Commit (prefix every line to &lt;b&gt;squash&lt;/b&gt; with s)&lt;br /&gt;
316-git rebase -i master&lt;br /&gt;
317-&lt;/div&gt;
</content><link rel='replies' type='application/atom+xml' href='http://datacatalyst.blogspot.com/feeds/5398610973903819489/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://datacatalyst.blogspot.com/2013/09/simple-cheat-sheet-search-using-grep.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2720443514548354607/posts/default/5398610973903819489'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2720443514548354607/posts/default/5398610973903819489'/><link rel='alternate' type='text/html' href='http://datacatalyst.blogspot.com/2013/09/simple-cheat-sheet-search-using-grep.html' title='Simple Cheat Sheet Search using Grep for your Mac'/><author><name>Anonymous</name><uri>http://www.blogger.com/profile/07756830537136925187</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2720443514548354607.post-405360024182891010</id><published>2013-09-02T17:17:00.000-07:00</published><updated>2013-09-02T18:17:03.567-07:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="json"/><category scheme="http://www.blogger.com/atom/ns#" term="payload"/><category scheme="http://www.blogger.com/atom/ns#" term="ruby"/><category scheme="http://www.blogger.com/atom/ns#" term="test"/><title type='text'>Random JSON Payload generator in Ruby</title><content type='html'>&lt;div dir=&quot;ltr&quot; style=&quot;text-align: left;&quot; trbidi=&quot;on&quot;&gt;
Here&#39;s a quick random JSON generator in Ruby for testing your code:&lt;br /&gt;
&lt;br /&gt;
# Get a random hash of strings&lt;br /&gt;
def get_random_string(character_count)&lt;br /&gt;
&amp;nbsp; o = &amp;nbsp;[(&#39;a&#39;..&#39;z&#39;),(&#39;A&#39;..&#39;Z&#39;)].map{|i| i.to_a}.flatten&lt;br /&gt;
&lt;br /&gt;
&amp;nbsp; (0...character_count).map{ o[rand(o.length)] }.join&lt;br /&gt;
end&lt;br /&gt;
&lt;br /&gt;
# Get random json payload, input is size of each field and number of fields&lt;br /&gt;
def get_random_json_payload(number_of_fields, field_size)&lt;br /&gt;
&amp;nbsp; fields = {}&lt;br /&gt;
&amp;nbsp; # Generate payload hash&lt;br /&gt;
&amp;nbsp; (1..number_of_fields).each do |field_number|&lt;br /&gt;
&amp;nbsp; &amp;nbsp; fields[&#39;field_&#39;+field_number.to_s] = get_random_string(field_size)&lt;br /&gt;
&amp;nbsp; end&lt;br /&gt;
&lt;br /&gt;
&amp;nbsp; fields.to_json&lt;br /&gt;
end&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
puts get_random_json_payload(2,10)&lt;br /&gt;
&lt;br /&gt;
The output will be like&lt;br /&gt;
{&quot;field_1&quot;:&quot;VUQZDpYRgA&quot;,&quot;field_2&quot;:&quot;LtMQYvSZca&quot;}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
</content><link rel='replies' type='application/atom+xml' href='http://datacatalyst.blogspot.com/feeds/405360024182891010/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://datacatalyst.blogspot.com/2013/09/random-json-payload-generator-in-ruby.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2720443514548354607/posts/default/405360024182891010'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2720443514548354607/posts/default/405360024182891010'/><link rel='alternate' type='text/html' href='http://datacatalyst.blogspot.com/2013/09/random-json-payload-generator-in-ruby.html' title='Random JSON Payload generator in Ruby'/><author><name>Anonymous</name><uri>http://www.blogger.com/profile/07756830537136925187</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2720443514548354607.post-1799414278579958110</id><published>2013-09-02T17:14:00.001-07:00</published><updated>2013-09-02T18:17:19.014-07:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="digital ocean"/><category scheme="http://www.blogger.com/atom/ns#" term="proxy server"/><category scheme="http://www.blogger.com/atom/ns#" term="python"/><category scheme="http://www.blogger.com/atom/ns#" term="VPS"/><title type='text'>How to set up a simple proxy server on Digital Ocean/ VPS</title><content type='html'>&lt;div dir=&quot;ltr&quot; style=&quot;text-align: left;&quot; trbidi=&quot;on&quot;&gt;
SSH in to your digital ocean droplet and run the following&lt;br /&gt;
&lt;br /&gt;
sudo apt-get install tinyproxy #install tinyproxy&lt;br /&gt;
which tinyproxy # make sure tinyproxy is installed&lt;br /&gt;
ps -ef | grep proxy # make sure tinyproxy is running&lt;br /&gt;
&lt;br /&gt;
Now forward a port from your server/machine to the remote proxy on the droplet&lt;br /&gt;
ssh -N root@dro.ple.tip.xxx -L local_port:localhost:remote_port&lt;br /&gt;
e.g. ssh -N root@198.199.xxx.xxx -L 8000:localhost:8888&lt;br /&gt;
&lt;br /&gt;
Send a request to localhost at port 8000 and see if it appears in the tinyproxy logs. If you are using ruby to do crawling/scraping on websites you can set proxy on the agent:&lt;br /&gt;
&lt;br /&gt;
&lt;span style=&quot;color: #444444; font-family: arial, sans-serif; font-size: x-small; line-height: 24.4375px;&quot;&gt;agent.set_proxy localhost, 8000&lt;/span&gt;&lt;span style=&quot;color: #444444; font-family: arial, sans-serif; font-size: x-small; line-height: 24.4375px;&quot;&gt;agent.get(&quot;somewebsite&quot;)&lt;/span&gt;&lt;br /&gt;
&lt;span style=&quot;color: #444444; font-family: arial, sans-serif; font-size: x-small;&quot;&gt;&lt;span style=&quot;line-height: 24.4375px;&quot;&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span style=&quot;color: #444444; font-family: arial, sans-serif; font-size: x-small;&quot;&gt;&lt;span style=&quot;line-height: 24.4375px;&quot;&gt;See the results on your droplet appear in:&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span style=&quot;background-color: white; color: #444444; font-family: arial, sans-serif; font-size: x-small; line-height: 24.44444465637207px;&quot;&gt;/var/log/&lt;/span&gt;&lt;span style=&quot;background-color: white; color: #444444; font-family: arial, sans-serif; font-size: x-small; font-weight: bold; line-height: 24.44444465637207px;&quot;&gt;tinyproxy&lt;/span&gt;&lt;span style=&quot;background-color: white; color: #444444; font-family: arial, sans-serif; font-size: x-small; line-height: 24.44444465637207px;&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;background-color: white; color: #444444; font-family: arial, sans-serif; font-size: x-small; font-weight: bold; line-height: 24.44444465637207px;&quot;&gt;log&lt;/span&gt;&lt;/div&gt;
</content><link rel='replies' type='application/atom+xml' href='http://datacatalyst.blogspot.com/feeds/1799414278579958110/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://datacatalyst.blogspot.com/2013/09/how-to-set-up-simple-proxy-server-on.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2720443514548354607/posts/default/1799414278579958110'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2720443514548354607/posts/default/1799414278579958110'/><link rel='alternate' type='text/html' href='http://datacatalyst.blogspot.com/2013/09/how-to-set-up-simple-proxy-server-on.html' title='How to set up a simple proxy server on Digital Ocean/ VPS'/><author><name>Anonymous</name><uri>http://www.blogger.com/profile/07756830537136925187</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2720443514548354607.post-4182344136489851283</id><published>2013-09-02T16:52:00.004-07:00</published><updated>2013-09-02T18:17:46.280-07:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="http"/><category scheme="http://www.blogger.com/atom/ns#" term="python"/><category scheme="http://www.blogger.com/atom/ns#" term="server"/><category scheme="http://www.blogger.com/atom/ns#" term="simple http server"/><title type='text'>Create a Simple HTTP Server on a directory to bypass same origin policy</title><content type='html'>&lt;div dir=&quot;ltr&quot; style=&quot;text-align: left;&quot; trbidi=&quot;on&quot;&gt;
&lt;br /&gt;
Let&#39;s say you are doing development of a website that uses Javascript on your laptop. Often you will notice that you will run in to issues where the browser does not allow you to access certain resource because of the &lt;a href=&quot;http://en.wikipedia.org/wiki/Same-origin_policy&quot; target=&quot;_blank&quot;&gt;same origin policy&lt;/a&gt;. Also, often at times you are working on a bunch of HTML files and want to host a few files over HTTP. &lt;br /&gt;
&lt;br /&gt;
You can do it quickly by creating a simple HTTP server that will open access to all files in the directory over HTTP. Here&#39;s the nifty python command that does that:&lt;br /&gt;
&lt;br /&gt;
python -m SimpleHTTPServer 8900&lt;br /&gt;
&lt;br /&gt;
Where 8900 is the port you want the content to be hosted upon.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;&lt;/div&gt;
</content><link rel='replies' type='application/atom+xml' href='http://datacatalyst.blogspot.com/feeds/4182344136489851283/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://datacatalyst.blogspot.com/2013/09/create-simple-http-server-on-directory.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2720443514548354607/posts/default/4182344136489851283'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2720443514548354607/posts/default/4182344136489851283'/><link rel='alternate' type='text/html' href='http://datacatalyst.blogspot.com/2013/09/create-simple-http-server-on-directory.html' title='Create a Simple HTTP Server on a directory to bypass same origin policy'/><author><name>Anonymous</name><uri>http://www.blogger.com/profile/07756830537136925187</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2720443514548354607.post-6206518582850417993</id><published>2013-08-19T15:33:00.002-07:00</published><updated>2013-09-02T18:18:21.972-07:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="files"/><category scheme="http://www.blogger.com/atom/ns#" term="find"/><category scheme="http://www.blogger.com/atom/ns#" term="unix"/><title type='text'>Find files and list their details</title><content type='html'>&lt;div dir=&quot;ltr&quot; style=&quot;text-align: left;&quot; trbidi=&quot;on&quot;&gt;
One of the things I find myself doing often is find files in *nix systems and looking at their last modified.&lt;br /&gt;
&lt;br /&gt;
Finding the files is easy&lt;br /&gt;
&lt;br /&gt;
$ find / -name somefile.txt&lt;br /&gt;
/etc/files/somefile.txt&lt;br /&gt;
&lt;br /&gt;
Now if you want to get details for the files such as last modified, etc.&lt;br /&gt;
&lt;br /&gt;
$ find / -name somefile.txt | xargs ls -la&lt;br /&gt;
-rw-r--r-- 1 user group 1129 Nov &amp;nbsp;3 &amp;nbsp;2011 /etc/files/somefile.txt&lt;/div&gt;
</content><link rel='replies' type='application/atom+xml' href='http://datacatalyst.blogspot.com/feeds/6206518582850417993/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://datacatalyst.blogspot.com/2013/08/find-files-and-list-their-details.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2720443514548354607/posts/default/6206518582850417993'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2720443514548354607/posts/default/6206518582850417993'/><link rel='alternate' type='text/html' href='http://datacatalyst.blogspot.com/2013/08/find-files-and-list-their-details.html' title='Find files and list their details'/><author><name>Anonymous</name><uri>http://www.blogger.com/profile/07756830537136925187</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2720443514548354607.post-7371538100220121543</id><published>2013-07-27T00:27:00.001-07:00</published><updated>2013-07-27T00:27:53.316-07:00</updated><title type='text'>Leap Motion Experience sucks</title><content type='html'>&lt;div dir=&quot;ltr&quot; style=&quot;text-align: left;&quot; trbidi=&quot;on&quot;&gt;
I hate the experience behind leap motion. From the beginning to the end - it has been crapulous. They are really really bad at running the company. Just today I got the controller in mail and couldn&#39;t wait to try it out. First, the controller wasn&#39;t being recognized by the inbuilt software and I had to disconnect the damn thing at least 10 times for it it actually recognize the device.&lt;br /&gt;
&lt;br /&gt;
After, recognizing I went through the orientation and it was ok except for the fact that it couldn&#39;t figure out the position of finger tips during certain hand gestures. Next step is to download some apps from Airspace which is Leap Motions app store. I completed the form for a new account and it took me back to the login screen without telling me if I registered successfully or not. I tried logging in with my new credentials and ehhh....can&#39;t. I did it multiple times and again the same thing.&lt;br /&gt;
&lt;br /&gt;
This is not the first time the company has disappointed me. The delay in shipment was very frustrating on top of that I had to re-enter my credit card information at least three times because every time I&#39;d do it they&#39;d send an email saying that it didn&#39;t go though correctly.&lt;br /&gt;
&lt;br /&gt;
Really pissing off. First of all, it is fine and dandy that you have all the slick demos and stuff but you really don&#39;t require any of that. Their first iteration should&#39;ve been something simple and mainly for developers. The whole experience is real frustrating. Before buying I advise people to wait until the company gets its act together.&lt;br /&gt;
&lt;br /&gt;
Really shameful to see how poorly they&#39;ve handled this.&lt;/div&gt;
</content><link rel='replies' type='application/atom+xml' href='http://datacatalyst.blogspot.com/feeds/7371538100220121543/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://datacatalyst.blogspot.com/2013/07/leap-motion-experience-sucks.html#comment-form' title='11 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2720443514548354607/posts/default/7371538100220121543'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2720443514548354607/posts/default/7371538100220121543'/><link rel='alternate' type='text/html' href='http://datacatalyst.blogspot.com/2013/07/leap-motion-experience-sucks.html' title='Leap Motion Experience sucks'/><author><name>Anonymous</name><uri>http://www.blogger.com/profile/07756830537136925187</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>11</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2720443514548354607.post-7204316132458358026</id><published>2013-07-10T23:17:00.003-07:00</published><updated>2013-07-10T23:17:51.748-07:00</updated><title type='text'>Adding custom jar files for UDFs in Hive cli using hiverc</title><content type='html'>&lt;div dir=&quot;ltr&quot; style=&quot;text-align: left;&quot; trbidi=&quot;on&quot;&gt;
To add custom functionality such as UDF Jars while using Hive CLI, just add your settings to the .hiverc file&lt;br /&gt;
&lt;br /&gt;
So let&#39;s say you are developing a UDF on your machine and want to test how it will behave in hive, simply put the below in your hiverc and it will get loaded whenever you start a new hive cli sesssion.&lt;br /&gt;
&lt;br /&gt;
add jar CustomUDFLib.jar;&lt;br /&gt;
create temporary function function1 AS &#39;com.yourcompany.hive.udf.function1&#39;;&lt;br /&gt;
create temporary function function2 AS &#39;com.yourcompany.hive.udf.function2&#39;;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The hiverc file like all rc files must reside in your home directory.&lt;br /&gt;
&lt;br /&gt;&lt;/div&gt;
</content><link rel='replies' type='application/atom+xml' href='http://datacatalyst.blogspot.com/feeds/7204316132458358026/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://datacatalyst.blogspot.com/2013/07/adding-custom-jar-files-for-udfs-in.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2720443514548354607/posts/default/7204316132458358026'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2720443514548354607/posts/default/7204316132458358026'/><link rel='alternate' type='text/html' href='http://datacatalyst.blogspot.com/2013/07/adding-custom-jar-files-for-udfs-in.html' title='Adding custom jar files for UDFs in Hive cli using hiverc'/><author><name>Anonymous</name><uri>http://www.blogger.com/profile/07756830537136925187</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2720443514548354607.post-6712371927984432459</id><published>2013-07-10T23:13:00.001-07:00</published><updated>2013-07-10T23:13:06.119-07:00</updated><title type='text'>Hive Metastore Data Model</title><content type='html'>&lt;div dir=&quot;ltr&quot; style=&quot;text-align: left;&quot; trbidi=&quot;on&quot;&gt;
The hive metastore data model in real interesting once you start to dig in to users, permissions, roles, etc.&lt;br /&gt;
&lt;br /&gt;
Here&#39;s a link to the ER diagram:&lt;br /&gt;
&lt;a href=&quot;https://issues.apache.org/jira/secure/attachment/12471108/HiveMetaStore.pdf&quot;&gt;https://issues.apache.org/jira/secure/attachment/12471108/HiveMetaStore.pdf&lt;/a&gt;&lt;/div&gt;
</content><link rel='replies' type='application/atom+xml' href='http://datacatalyst.blogspot.com/feeds/6712371927984432459/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://datacatalyst.blogspot.com/2013/07/hive-metastore-data-model.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2720443514548354607/posts/default/6712371927984432459'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2720443514548354607/posts/default/6712371927984432459'/><link rel='alternate' type='text/html' href='http://datacatalyst.blogspot.com/2013/07/hive-metastore-data-model.html' title='Hive Metastore Data Model'/><author><name>Anonymous</name><uri>http://www.blogger.com/profile/07756830537136925187</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2720443514548354607.post-7784280567937544955</id><published>2013-07-10T23:09:00.001-07:00</published><updated>2013-07-16T10:47:51.339-07:00</updated><title type='text'>Use Custom UDFs with Hue</title><content type='html'>&lt;div dir=&quot;ltr&quot; style=&quot;text-align: left;&quot; trbidi=&quot;on&quot;&gt;
&lt;div dir=&quot;ltr&quot; style=&quot;text-align: left;&quot; trbidi=&quot;on&quot;&gt;
If you want to use custom UDFs with hue, here are the steps you need to follow:&lt;br /&gt;
&lt;br /&gt;
1. Generate the UDF jar&lt;br /&gt;
2. Drop the jar in hadoop&lt;br /&gt;
hadoop dfs -copyFromLocal udf.jar /path/to/file/in/hdfs&lt;br /&gt;
3. In order for Hue to recognize the jar, you need to add the hive.aux.jars.path property to hive-site.xml file on the hue box&lt;br /&gt;
&lt;span style=&quot;background-color: white; color: #444444; font-family: arial, sans-serif; font-size: x-small; line-height: 16px;&quot;&gt;&amp;lt;&lt;b&gt;property&lt;/b&gt;&lt;/span&gt;&lt;span style=&quot;background-color: white; color: #444444; font-family: arial, sans-serif; font-size: x-small; line-height: 16px;&quot;&gt;&amp;gt;&lt;/span&gt;&lt;br /&gt;
&lt;span style=&quot;background-color: white; color: #444444; font-family: arial, sans-serif; font-size: x-small; line-height: 16px;&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span style=&quot;background-color: white; color: #444444; font-family: arial, sans-serif; font-size: x-small; font-weight: bold; line-height: 16px;&quot;&gt;name&lt;/span&gt;&lt;span style=&quot;background-color: white; color: #444444; font-family: arial, sans-serif; font-size: x-small; line-height: 16px;&quot;&gt;&amp;gt;&lt;/span&gt;hive.aux.jars.path&lt;span style=&quot;background-color: white; color: #444444; font-family: arial, sans-serif; font-size: x-small; line-height: 16px;&quot;&gt;&amp;lt;/&lt;/span&gt;&lt;span style=&quot;background-color: white; color: #444444; font-family: arial, sans-serif; font-size: x-small; font-weight: bold; line-height: 16px;&quot;&gt;name&lt;/span&gt;&lt;span style=&quot;background-color: white; color: #444444; font-family: arial, sans-serif; font-size: x-small; line-height: 16px;&quot;&gt;&amp;gt;&amp;nbsp;&lt;/span&gt;&lt;br /&gt;
&lt;span style=&quot;background-color: white; color: #444444; font-family: arial, sans-serif; font-size: x-small; line-height: 16px;&quot;&gt;&amp;lt;&lt;b&gt;value&lt;/b&gt;&lt;/span&gt;&lt;span style=&quot;background-color: white; color: #444444; font-family: arial, sans-serif; font-size: x-small; line-height: 16px;&quot;&gt;&amp;gt;&lt;/span&gt;&lt;span style=&quot;background-color: white; color: #444444; font-family: arial, sans-serif; font-size: x-small; line-height: 16px;&quot;&gt;&lt;value&gt;hdfs:///path/to/file/in/hdfs&lt;/value&gt;&lt;/span&gt;&lt;span style=&quot;background-color: white; color: #444444; font-family: arial, sans-serif; font-size: x-small; line-height: 16px;&quot;&gt;&amp;lt;&lt;b&gt;/value&lt;/b&gt;&lt;/span&gt;&lt;span style=&quot;background-color: white; color: #444444; font-family: arial, sans-serif; font-size: x-small; line-height: 16px;&quot;&gt;&amp;gt;&lt;/span&gt;&lt;br /&gt;
&lt;span style=&quot;background-color: white; color: #444444; font-family: arial, sans-serif; font-size: x-small; line-height: 16px;&quot;&gt;&amp;lt;/&lt;b&gt;property&lt;/b&gt;&lt;/span&gt;&lt;span style=&quot;background-color: white; color: #444444; font-family: arial, sans-serif; font-size: x-small; line-height: 16px;&quot;&gt;&amp;gt;&lt;/span&gt;&lt;br /&gt;
&lt;span style=&quot;color: #444444; font-family: arial, sans-serif; font-size: x-small;&quot;&gt;&lt;span style=&quot;line-height: 16px;&quot;&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;
&lt;span style=&quot;color: #444444; font-family: arial, sans-serif; font-size: x-small;&quot;&gt;&lt;span style=&quot;line-height: 16px;&quot;&gt;4. Create the function definition using the Hue Query Editor (section on the left hand side to add jars/functions/etc)&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span style=&quot;color: #444444; font-family: arial, sans-serif; font-size: x-small;&quot;&gt;&lt;span style=&quot;line-height: 16px;&quot;&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/div&gt;
</content><link rel='replies' type='application/atom+xml' href='http://datacatalyst.blogspot.com/feeds/7784280567937544955/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://datacatalyst.blogspot.com/2013/07/use-custom-udfs-with-hue.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2720443514548354607/posts/default/7784280567937544955'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2720443514548354607/posts/default/7784280567937544955'/><link rel='alternate' type='text/html' href='http://datacatalyst.blogspot.com/2013/07/use-custom-udfs-with-hue.html' title='Use Custom UDFs with Hue'/><author><name>Anonymous</name><uri>http://www.blogger.com/profile/07756830537136925187</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2720443514548354607.post-2225015264309063364</id><published>2013-07-10T22:05:00.003-07:00</published><updated>2013-07-10T22:05:36.384-07:00</updated><title type='text'>Mysql difference between := and = in defining user variables</title><content type='html'>&lt;div dir=&quot;ltr&quot; style=&quot;text-align: left;&quot; trbidi=&quot;on&quot;&gt;
In MySQL you will notice usage of := and = as an assignment operator for variables. But which one to use when? This always confused me.&lt;br /&gt;
&lt;br /&gt;
So the 411 is that basically you have to use := when you assign using a select clause for e.g.&lt;br /&gt;
SELECT @var := some value&lt;br /&gt;
SELECT @last_date := IFNULL(MAX(date_observed), &#39;2013-05-01&#39;) FROM mart_dev_console_stats&lt;br /&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;br /&gt;
When using the SET statement you have to use = . e.g.&lt;br /&gt;
SET @var = some value&lt;br /&gt;
&lt;br /&gt;
Another thing to note is that in stored procedures a select&amp;nbsp;@var := some value will actually return a result set to the client. If you are using the variable for further processing and don&#39;t really want to send it to client I suggest using&lt;br /&gt;
&lt;br /&gt;
SELECT something into&amp;nbsp;@var&lt;br /&gt;
FROM table&lt;br /&gt;
&lt;br /&gt;
See &lt;a href=&quot;http://dev.mysql.com/doc/refman/5.0/en/select-into.html&quot;&gt;http://dev.mysql.com/doc/refman/5.0/en/select-into.html&lt;/a&gt; for more info&lt;br /&gt;
&lt;br /&gt;&lt;/div&gt;
</content><link rel='replies' type='application/atom+xml' href='http://datacatalyst.blogspot.com/feeds/2225015264309063364/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://datacatalyst.blogspot.com/2013/07/mysql-difference-between-and-in.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2720443514548354607/posts/default/2225015264309063364'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2720443514548354607/posts/default/2225015264309063364'/><link rel='alternate' type='text/html' href='http://datacatalyst.blogspot.com/2013/07/mysql-difference-between-and-in.html' title='Mysql difference between := and = in defining user variables'/><author><name>Anonymous</name><uri>http://www.blogger.com/profile/07756830537136925187</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2720443514548354607.post-7624771892276127767</id><published>2013-07-02T00:14:00.001-07:00</published><updated>2013-07-02T00:14:05.813-07:00</updated><title type='text'>Delete Jenkins Jobs using CURL</title><content type='html'>&lt;div dir=&quot;ltr&quot; style=&quot;text-align: left;&quot; trbidi=&quot;on&quot;&gt;
It is real easy to automate the deletion of multiple jobs in jenkins. In Google Chrome open Developer Tools &amp;gt; Network. Now click on the delete job link on one of the jenkins jobs you want to delete.&lt;br /&gt;
&lt;br /&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;a href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiYPv227kWjaUvIEGORGi5eRm8qH_qgdIkZ5PIbolpoDWna8-gvRQL-GSh9PLlEpyWUwrnlsj8L4oRXOAnfEaoiNhswKyILhcjyjhx_05OhGFxuwnojiyjST7wPXHlfOzND2nW1XsFI-Mg/s874/Screen+Shot+2013-07-02+at+12.05.44+AM.png&quot; imageanchor=&quot;1&quot; style=&quot;margin-left: 1em; margin-right: 1em;&quot;&gt;&lt;img border=&quot;0&quot; height=&quot;320&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiYPv227kWjaUvIEGORGi5eRm8qH_qgdIkZ5PIbolpoDWna8-gvRQL-GSh9PLlEpyWUwrnlsj8L4oRXOAnfEaoiNhswKyILhcjyjhx_05OhGFxuwnojiyjST7wPXHlfOzND2nW1XsFI-Mg/s320/Screen+Shot+2013-07-02+at+12.05.44+AM.png&quot; width=&quot;218&quot; /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
Now inside the developer tools &amp;gt; network section you will see a post request being made to the jenkins server. Right click on the post request and choose copy as curl.&lt;br /&gt;
&lt;br /&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;a href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi4t2mFjuZkWRcFpRTlhLyNGv14f0RST7sFdsAZbVGSMyWjVVPCuwfLEIHcs30OL-a4LV15Bj7cEjtE0YQR3VDEaDr4etuHw16a4UJ7KSVl-i5m-mS2-jeoVoj_hDCf0Fj3LUW6GfkZ1zg/s1600/Screen+Shot+2013-07-02+at+12.10.07+AM.png&quot; imageanchor=&quot;1&quot; style=&quot;margin-left: 1em; margin-right: 1em;&quot;&gt;&lt;img border=&quot;0&quot; height=&quot;150&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi4t2mFjuZkWRcFpRTlhLyNGv14f0RST7sFdsAZbVGSMyWjVVPCuwfLEIHcs30OL-a4LV15Bj7cEjtE0YQR3VDEaDr4etuHw16a4UJ7KSVl-i5m-mS2-jeoVoj_hDCf0Fj3LUW6GfkZ1zg/s320/Screen+Shot+2013-07-02+at+12.10.07+AM.png&quot; width=&quot;320&quot; /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
Your string will look something like&lt;br /&gt;
curl &quot;http://servername:port/job/job_name/doDelete&quot; -X POST -H &quot;Cookie: JSESSIONID...&lt;br /&gt;
JSESSIONID...&lt;br /&gt;
JSESSIONID...&lt;br /&gt;
...&quot; -H &quot;Referer: http://servername:port/job/job_name/&quot; -H &quot;Connection: keep-alive&quot; -H &quot;Content-Length: 0&quot;&lt;br /&gt;
&lt;br /&gt;
Replace the job_name with whatever job name you want and run the curls as part of script. Voila! Fast Delete in Jenkins!&lt;/div&gt;
</content><link rel='replies' type='application/atom+xml' href='http://datacatalyst.blogspot.com/feeds/7624771892276127767/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://datacatalyst.blogspot.com/2013/07/delete-jenkins-jobs-using-curl.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2720443514548354607/posts/default/7624771892276127767'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2720443514548354607/posts/default/7624771892276127767'/><link rel='alternate' type='text/html' href='http://datacatalyst.blogspot.com/2013/07/delete-jenkins-jobs-using-curl.html' title='Delete Jenkins Jobs using CURL'/><author><name>Anonymous</name><uri>http://www.blogger.com/profile/07756830537136925187</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiYPv227kWjaUvIEGORGi5eRm8qH_qgdIkZ5PIbolpoDWna8-gvRQL-GSh9PLlEpyWUwrnlsj8L4oRXOAnfEaoiNhswKyILhcjyjhx_05OhGFxuwnojiyjST7wPXHlfOzND2nW1XsFI-Mg/s72-c/Screen+Shot+2013-07-02+at+12.05.44+AM.png" height="72" width="72"/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2720443514548354607.post-3930957647955728951</id><published>2013-06-22T11:57:00.001-07:00</published><updated>2013-08-14T11:47:35.255-07:00</updated><title type='text'>Upstart script for making hive as a service</title><content type='html'>&lt;div dir=&quot;ltr&quot; style=&quot;text-align: left;&quot; trbidi=&quot;on&quot;&gt;
&lt;span style=&quot;background-color: white; color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px;&quot;&gt;Put this .conf file in your /etc/init folder.&amp;nbsp;&lt;/span&gt;&lt;br /&gt;
&lt;br style=&quot;background-color: white; color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px;&quot; /&gt;
&lt;span style=&quot;color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif;&quot;&gt;&lt;span style=&quot;font-size: 13px; line-height: 18px;&quot;&gt;description &amp;nbsp; &amp;nbsp; &quot;Hive Server&quot;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span style=&quot;color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif;&quot;&gt;&lt;span style=&quot;font-size: 13px; line-height: 18px;&quot;&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span style=&quot;color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif;&quot;&gt;&lt;span style=&quot;font-size: 13px; line-height: 18px;&quot;&gt;start on runlevel [2345]&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span style=&quot;color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif;&quot;&gt;&lt;span style=&quot;font-size: 13px; line-height: 18px;&quot;&gt;stop on runlevel [016]&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span style=&quot;color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif;&quot;&gt;&lt;span style=&quot;font-size: 13px; line-height: 18px;&quot;&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span style=&quot;color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif;&quot;&gt;&lt;span style=&quot;font-size: 13px; line-height: 18px;&quot;&gt;expect fork&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span style=&quot;color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif;&quot;&gt;&lt;span style=&quot;font-size: 13px; line-height: 18px;&quot;&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span style=&quot;color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif;&quot;&gt;&lt;span style=&quot;font-size: 13px; line-height: 18px;&quot;&gt;script&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span style=&quot;color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif;&quot;&gt;&lt;span style=&quot;font-size: 13px; line-height: 18px;&quot;&gt;&amp;nbsp; echo &quot;Starting Hive Service&quot;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span style=&quot;color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif;&quot;&gt;&lt;span style=&quot;font-size: 13px; line-height: 18px;&quot;&gt;&amp;nbsp; export HIVE_HOME=/usr/lib/hive-0.11.0-bin&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span style=&quot;color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif;&quot;&gt;&lt;span style=&quot;font-size: 13px; line-height: 18px;&quot;&gt;&amp;nbsp; export HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span style=&quot;color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif;&quot;&gt;&lt;span style=&quot;font-size: 13px; line-height: 18px;&quot;&gt;&amp;nbsp; export HIVE_CONF_DIR=/etc/hive/conf/hiveserver1&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span style=&quot;color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif;&quot;&gt;&lt;span style=&quot;font-size: 13px; line-height: 18px;&quot;&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span style=&quot;color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif;&quot;&gt;&lt;span style=&quot;font-size: 13px; line-height: 18px;&quot;&gt;&amp;nbsp; $HIVE_HOME/bin/hive --service hiveserver &amp;gt; /var/log/hive/hiveserver.out 2&amp;gt;&amp;amp;1 &amp;amp;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span style=&quot;color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif;&quot;&gt;&lt;span style=&quot;font-size: 13px; line-height: 18px;&quot;&gt;end script&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span style=&quot;background-color: white; color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px;&quot;&gt;&lt;br /&gt;&lt;/span&gt;
&lt;span style=&quot;background-color: white; color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px;&quot;&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style=&quot;background-color: white; color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px;&quot;&gt;To start/stop/check your service you can use&lt;/span&gt;&lt;br /&gt;
&lt;span style=&quot;background-color: white; color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px;&quot;&gt;start hiveserver&lt;/span&gt;&lt;br /&gt;
&lt;span style=&quot;background-color: white; color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px;&quot;&gt;stop hiveserver&lt;/span&gt;&lt;br /&gt;
&lt;span style=&quot;background-color: white; color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px;&quot;&gt;status hiveserver&lt;/span&gt;&lt;/div&gt;
</content><link rel='replies' type='application/atom+xml' href='http://datacatalyst.blogspot.com/feeds/3930957647955728951/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://datacatalyst.blogspot.com/2013/06/upstart-script-for-making-hive-as.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2720443514548354607/posts/default/3930957647955728951'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2720443514548354607/posts/default/3930957647955728951'/><link rel='alternate' type='text/html' href='http://datacatalyst.blogspot.com/2013/06/upstart-script-for-making-hive-as.html' title='Upstart script for making hive as a service'/><author><name>Anonymous</name><uri>http://www.blogger.com/profile/07756830537136925187</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2720443514548354607.post-8317025196739531483</id><published>2013-06-08T15:05:00.003-07:00</published><updated>2013-06-08T15:05:27.910-07:00</updated><title type='text'>Write a simple Service using Upstart on Ubuntu</title><content type='html'>&lt;div dir=&quot;ltr&quot; style=&quot;text-align: left;&quot; trbidi=&quot;on&quot;&gt;
There are two ways of writing a service on Ubuntu. One is by dropping a complex config file in init.d and the other is by using &quot;upstart&quot;.&lt;br /&gt;
&lt;br /&gt;
Upstart makes it effortless to write services. All you have to do is put a configuration file in /etc/init and upstart takes care of the rest. You can then start and stop services by using:&lt;br /&gt;
start test&lt;br /&gt;
stop test&lt;br /&gt;
&lt;br /&gt;
To get the status of the service, you can do&lt;br /&gt;
status test&lt;br /&gt;
&lt;br /&gt;
Here&#39;s a sample service that calls a shell script that prints strings on the console:&lt;br /&gt;
&lt;br /&gt;
################ test.conf&lt;br /&gt;
description &amp;nbsp; &amp;nbsp; &quot;Hive Server&quot;&lt;br /&gt;
author &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&quot;Yash Ranadive&lt;yash lookout.com=&quot;&quot;&gt;&quot;&lt;/yash&gt;&lt;br /&gt;
&lt;br /&gt;
start on runlevel [2345]&lt;br /&gt;
stop on runlevel [016]&lt;br /&gt;
&lt;br /&gt;
respawn&lt;br /&gt;
&lt;br /&gt;
script&lt;br /&gt;
&amp;nbsp; /home/yranadive/print_delay.sh &amp;gt;&amp;gt; /tmp/print_delay.out 2&amp;gt;&amp;amp;1&lt;br /&gt;
end script&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
You need to know what kind of a program you are trying to run. For. e.g. if the program forks itself after running it or not. If the program is a service you will need to add expect daemon to the conf file.&lt;br /&gt;
&lt;br /&gt;
Here&#39;s the shell script print_delay.sh&lt;br /&gt;
&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
c=1&lt;br /&gt;
while [ $c -le 5 ]&lt;br /&gt;
do&lt;br /&gt;
&amp;nbsp; echo &quot;Test line at $(date)&quot;&lt;br /&gt;
&amp;nbsp; sleep 10&lt;br /&gt;
&amp;nbsp; (( c++ ))&lt;br /&gt;
done&lt;br /&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
</content><link rel='replies' type='application/atom+xml' href='http://datacatalyst.blogspot.com/feeds/8317025196739531483/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://datacatalyst.blogspot.com/2013/06/write-simple-service-using-upstart-on.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2720443514548354607/posts/default/8317025196739531483'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2720443514548354607/posts/default/8317025196739531483'/><link rel='alternate' type='text/html' href='http://datacatalyst.blogspot.com/2013/06/write-simple-service-using-upstart-on.html' title='Write a simple Service using Upstart on Ubuntu'/><author><name>Anonymous</name><uri>http://www.blogger.com/profile/07756830537136925187</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2720443514548354607.post-1615662430902524514</id><published>2013-05-22T00:06:00.002-07:00</published><updated>2013-05-22T00:06:58.547-07:00</updated><title type='text'>Setting up Tmux on the mac</title><content type='html'>&lt;div dir=&quot;ltr&quot; style=&quot;text-align: left;&quot; trbidi=&quot;on&quot;&gt;
Tmux is used to multiplex several consoles while you are logged in a linux box or on your mac terminal. It allows you to create split panes so you can see more stuff on the screen. You can check out the output of top on one pane, while you modify some code on another pane.&lt;br /&gt;
&lt;br /&gt;
Panes are easy to create and moving across panes is real easy. See the thoughtbot link below on how to use Tmux.&lt;br /&gt;
&lt;br /&gt;
Here are the links you need to set up tmux on your mac. Make sure you follow the &amp;nbsp;thoughtbot tutorial to customized your ~/.tmux.conf file. If you are coming from the &quot;Screen&quot; world, you&#39;d like to set your prefix key to Ctrl A.&lt;br /&gt;
&lt;br /&gt;
https://blogs.oracle.com/unixben/entry/install_tmux_on_mac_os&lt;br /&gt;
&lt;br /&gt;
http://robots.thoughtbot.com/post/2641409235/a-tmux-crash-course&lt;/div&gt;
</content><link rel='replies' type='application/atom+xml' href='http://datacatalyst.blogspot.com/feeds/1615662430902524514/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://datacatalyst.blogspot.com/2013/05/setting-up-tmux-on-mac.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2720443514548354607/posts/default/1615662430902524514'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2720443514548354607/posts/default/1615662430902524514'/><link rel='alternate' type='text/html' href='http://datacatalyst.blogspot.com/2013/05/setting-up-tmux-on-mac.html' title='Setting up Tmux on the mac'/><author><name>Anonymous</name><uri>http://www.blogger.com/profile/07756830537136925187</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry></feed>