tag:blogger.com,1999:blog-27204435145483546072024-03-20T19:23:54.098-07:00Data CatalystYash Ranadive's BlogAnonymoushttp://www.blogger.com/profile/07756830537136925187noreply@blogger.comBlogger113125tag:blogger.com,1999:blog-2720443514548354607.post-55037778815192890862014-10-10T00:00:00.003-07:002014-10-10T00:00:34.148-07:00Convert ISO-8601 time to UTC time in Hive<div dir="ltr" style="text-align: left;" trbidi="on">
In order to convert the ISO-8601 datetime string .e.g "2013-06-10T12:31:00+0700" in to UTC time "2013-06-10T05:31:00Z" you can do the following<br />
<br />
<br />
select from_unixtime(iso8601_to_unix_timestamp('2013-06-10T12:31:00Z'), 'yyyy-MM-dd-HH-mm-ss') from table limit 1;<br />
<br />
<br />
For this to work you will need the <a href="https://github.com/simplymeasured/hive-udf" target="_blank">simply measured's hive udf</a> and you will need to add the following jars:<br />
<br />
hive> ADD JAR hdfs:///external-jars/commons-codec-1.9.jar;<br />
hive> ADD JAR hdfs:///external-jars/joda-time-2.2.jar;<br />
hive> ADD JAR hdfs:///external-jars/sm-hive-udf-1.0-SNAPSHOT.jar;<br />
<br />
hive>select from_unixtime(iso8601_to_unix_timestamp('2013-06-10T12:31:00Z'), 'yyyy-MM-dd-HH-mm-ss') from table limit 1;<br />
<div>
<br /></div>
</div>
Anonymoushttp://www.blogger.com/profile/07756830537136925187noreply@blogger.com0tag:blogger.com,1999:blog-2720443514548354607.post-88978546915670564682014-02-25T13:35:00.000-08:002014-02-25T13:35:05.697-08:00Ruby Read Large files from the Network and write to File<div dir="ltr" style="text-align: left;" trbidi="on">
If you are writing a large file to disk using the traditional way (open-uri), you will notice that the memory usage spikes up just before writing the file to disk.<br />
<br />
The workaround to this is to use the HTTP.start method and write chunks at a time to disk as they are received.<br />
<br />
Net::HTTP.start(end_point, { :use_ssl =>true }) do |http|<br />
http.request_get(resource) do |response|<br />
open filename(date), 'w' do |io|<br />
response.read_body do |chunk|<br />
io.write chunk<br />
end<br />
end<br />
end<br />
end<br />
<br /></div>
Anonymoushttp://www.blogger.com/profile/07756830537136925187noreply@blogger.com0tag:blogger.com,1999:blog-2720443514548354607.post-34076416732201863662014-02-18T22:48:00.002-08:002014-02-18T22:48:36.361-08:00Svbtle<div dir="ltr" style="text-align: left;" trbidi="on">
I'm looking for a good hosted Mardown blogging solution. I've written my first blog using Svbtle at http://dubwubwub.svbtle.com/shell-yesterdays-dats<br />
<br />
It was fairly easy to write. But the Svbtle interface requires getting used to. Overall it looks good. Looking forward to using more of it and documenting my experiences.</div>
Anonymoushttp://www.blogger.com/profile/07756830537136925187noreply@blogger.com0tag:blogger.com,1999:blog-2720443514548354607.post-80129358474010013742014-02-13T22:38:00.000-08:002014-02-13T22:38:12.881-08:00Don't put you passwords in the commandline<div dir="ltr" style="text-align: left;" trbidi="on">
Consider this shell command<br />
<br />
$>mysql -u username -p password<br />
<br />
Passwords on the command line are a real BAD idea. Here's why:<br />
<br />
1. They are easily viewable in the process-list by doing a <i>ps</i><br />
2. They are easily viewable in the command history by doing <i>history</i><br />
<br />
Remember, don't enter your passwords in version control systems like Git. Git servers like github are often published to a wider audience within an organization. Always use external configuration files or a configuration framework such as Configatron to deploy password/username/keys/etc.</div>
Anonymoushttp://www.blogger.com/profile/07756830537136925187noreply@blogger.com0tag:blogger.com,1999:blog-2720443514548354607.post-53559085478090825702014-02-11T20:26:00.003-08:002014-02-11T20:26:59.443-08:00Code Kata, Simple implementation of Bloom Filters<div dir="ltr" style="text-align: left;" trbidi="on">
We are doing <a href="http://en.wikipedia.org/wiki/Kata_(programming)" target="_blank">kata</a> at work this month. It is pretty exciting as you spend around 30 mins everyday learning a new technique or stretching your coding abilities. I tried the <a href="http://codekata.com/kata/kata05-bloom-filters/" target="_blank">code kata</a> website for some fun exercises.<br />
<br />
One thing that I've been particularly interested in the last year or so is <a href="http://en.wikipedia.org/wiki/Bloom_filter" target="_blank">Bloom Filters</a>. Joins are so expensive! Bloom filters are simply amazing. They help you find if a value is NOT in a particular.<br />
<br />
Here's my implementation of Bloom Filters in Ruby. It is not perfect and can use a <a href="https://github.com/tyler/bitset" target="_blank">BitSet</a> ruby implementation to save on some memory. Also, not tested very thoroughly but you get the idea.<br />
<br />
<br />
# http://codekata.com/kata/kata05-bloom-filters/<br />
<br />
class BloomFilter<br />
def initialize(bitmap_size)<br />
@bitmap = Array.new<br />
@bitmap_max_size = bitmap_size<br />
end<br />
<br />
def hash_function_1(some_object)<br />
raw_val = some_object.inspect.each_byte.inject do |sum,c|<br />
sum += c<br />
end<br />
<br />
raw_val % @bitmap_max_size<br />
end<br />
<br />
def hash_function_2(some_object)<br />
raw_val = some_object.inspect.each_byte.inject do |sum,c|<br />
sum += c<br />
end<br />
<br />
(raw_val + raw_val.to_s.length**3) % @bitmap_max_size<br />
end<br />
<br />
def hash_function_3(some_object)<br />
raw_val = some_object.inspect.each_byte.inject do |sum,c|<br />
sum += c<br />
end<br />
<br />
(raw_val + raw_val.to_s.split().last.to_i**8) % @bitmap_max_size<br />
end<br />
<br />
def put(put_object)<br />
@bitmap[hash_function_1(put_object)] = 1<br />
@bitmap[hash_function_2(put_object)] = 1<br />
@bitmap[hash_function_3(put_object)] = 1<br />
end<br />
<br />
def exists(put_object)<br />
@bitmap[hash_function_1(put_object)] == 1 &&<br />
@bitmap[hash_function_2(put_object)] == 1 &&<br />
@bitmap[hash_function_3(put_object)] == 1<br />
end<br />
end<br />
<br />
<br />
a = BloomFilter.new(1000)<br />
(1..1000).each {|x| a.put("test#{x}")}<br />
(1..1000).each {|x| puts "#{x}" unless a.exists("test#{x}")}<br />
<br />
<br />
This was quickly hacked, so let me know your comments on how this can be improved.</div>
Anonymoushttp://www.blogger.com/profile/07756830537136925187noreply@blogger.com0tag:blogger.com,1999:blog-2720443514548354607.post-5921708465334451452013-11-06T11:46:00.001-08:002013-11-06T11:46:11.355-08:00Presto - Facebook's Data Crunching Monster<div dir="ltr" style="text-align: left;" trbidi="on">
I came to know about Facebook Presto for the first time few months back at the "Analytics at Web Scale" conference at Facebook. Today they open sourced Presto<br />
<a href="https://www.facebook.com/notes/facebook-engineering/presto-interacting-with-petabytes-of-data-at-facebook/10151786197628920">https://www.facebook.com/notes/facebook-engineering/presto-interacting-with-petabytes-of-data-at-facebook/10151786197628920</a><br />
<br />
Really excited to see how this changes the big data landscape as analysts get more hungry for data and demand faster speed of query execution.</div>
Anonymoushttp://www.blogger.com/profile/07756830537136925187noreply@blogger.com0tag:blogger.com,1999:blog-2720443514548354607.post-45001170838249276052013-10-24T15:33:00.001-07:002013-10-24T15:33:14.352-07:00Copy gems from one server to another<div dir="ltr" style="text-align: left;" trbidi="on">
<br />
First copy the gem list from the source box:<div>
ssh account@sourcegembox 'gem list' > /tmp/gem-list</div>
<div>
<br /></div>
<div>
Now install the gems:</div>
<div>
cat /tmp/gem-list | cut -d " " -f 1 | xargs sudo gem install</div>
<div>
<br /></div>
<div>
<br /></div>
</div>
Anonymoushttp://www.blogger.com/profile/07756830537136925187noreply@blogger.com0tag:blogger.com,1999:blog-2720443514548354607.post-50089991405844826312013-10-16T17:06:00.001-07:002013-10-16T17:06:28.785-07:00Download Sqoop 2 from cloudera using apt-get install<div dir="ltr" style="text-align: left;" trbidi="on">
Today I had problems downloading the sqoop2 server and client using apt-get install. The problem was that apt wasn't able to get the correct package. I tried to manually set up the .list file in /etc/apt/sources.list.d directory according to<br />
the cloudera link <a href="http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/4.2.1/CDH4-Installation-Guide/cdh4ig_topic_4_4.html">http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/4.2.1/CDH4-Installation-Guide/cdh4ig_topic_4_4.html</a><br />
<br />
But that did not work either. Finally I was able to get it running by downloading and installing the debian file for cloudera from <a href="http://archive.cloudera.com/cdh4/one-click-install/precise/amd64/cdh4-repository_1.0_all.deb">http://archive.cloudera.com/cdh4/one-click-install/precise/amd64/cdh4-repository_1.0_all.deb</a>. Here's the link for lucid systems <a href="http://archive.cloudera.com/cdh4/one-click-install/lucid/amd64/cdh4-repository_1.0_all.deb">http://archive.cloudera.com/cdh4/one-click-install/lucid/amd64/cdh4-repository_1.0_all.deb</a>.<br />
<br />
sudo dpkg -i cdh4-repository_1.0_all.deb<br />
<br />
Now try<br />
<br />
sudo apt-get install sqoop2-server<br />
sudo apt-get install sqoop2-client</div>
Anonymoushttp://www.blogger.com/profile/07756830537136925187noreply@blogger.com0tag:blogger.com,1999:blog-2720443514548354607.post-90950475780117238522013-09-05T19:44:00.000-07:002013-09-05T23:21:38.465-07:00Some links for Hadoop Performance Tuning<div dir="ltr" style="text-align: left;" trbidi="on">
Here's a list of links for hadoop / hive tuning techniques. This list was compiled by <a href="https://twitter.com/OngEmil%E2%80%8E" target="_blank">@OngEmil</a><br />
<br />
<br />
<blockquote style="text-align: -webkit-auto;" type="cite">
<ul style="background-color: rgba(255, 255, 255, 0);">
<li style="text-align: left;"><a href="http://gbif.blogspot.com/2011/01/setting-up-hadoop-cluster-part-1-manual.html">http://gbif.blogspot.com/2011/01/setting-up-hadoop-cluster-part-1-manual.html</a> – This one is the most interesting. It breaks down reducer logs line-by-line and shows how to optimize based on them using client-settable options.</li>
<li style="text-align: left;"><a href="http://blog.cloudera.com/blog/2009/12/7-tips-for-improving-mapreduce-performance/">http://blog.cloudera.com/blog/2009/12/7-tips-for-improving-mapreduce-performance/</a> - A grab bag of tips. Many of these are usable entirely at client-side.</li>
<li style="text-align: left;"><a href="http://blog.cloudera.com/blog/2009/03/configuration-parameters-what-can-you-just-ignore/">http://blog.cloudera.com/blog/2009/03/configuration-parameters-what-can-you-just-ignore/</a> - The internal settings are similar to the first link, but a good follow-on discussion.</li>
<li style="text-align: left;"><a href="http://www.slideshare.net/cloudera/mr-perf">http://www.slideshare.net/cloudera/mr-perf</a> – A nice slideshare with actual settings and explanations</li>
</ul>
</blockquote>
</div>
Anonymoushttp://www.blogger.com/profile/07756830537136925187noreply@blogger.com0tag:blogger.com,1999:blog-2720443514548354607.post-18772601157486067452013-09-02T18:12:00.002-07:002013-09-02T18:16:16.377-07:00Hadoop monitoring and poor man's profiling<div dir="ltr" style="text-align: left;" trbidi="on">
<b>Hadoop Monitoring</b><br />
I created a simple script that monitors the hadoop cluster for changes in the number of nodes on the cluster. If you run it with an external tool such as Jenkins you can send error emails to yourself whenever the script exits with error code 1. You can also extend the script to do mailing if you don't want to use an external tool. Since the hadoop dfsadmin -report command fails when the namenode is down this script also alerts you when the namenode is unhappy. There are many ways to monitor your cluster such as Cloudera Manager but we decided to create our own tools for the time being. You can also extend the script to check for whatever you like on the nodes. I'm checking for the tmp directory space as it often fills up the cluster when bad queries are executed on Hive.<br />
<br />
Here's the gist:<br />
<a href="https://gist.github.com/yash-ranadive/6418644">https://gist.github.com/yash-ranadive/6418644</a><br />
<br />
<b>Poor Man's Profiling</b><br />
There are many ways you can do profiling on your hadoop cluster to see what is causing slowness. One such technique is to take thread dumps of the java threads and see what process is running most frequently. You can do so by taking a few(10-15) thread dumps in random intervals. If you see the same methods being called over and over again - you can infer the location your app is spending most of its time.<br />
<br />
To find what's<br />
<br />
We did one such exercise when we saw our hive queries were stuck in the last reduce phase. The singular reduce is almost always the bottleneck. To take the dump -<br />
1. Navigate the resource manager (YARN) and click on the query you are running.<br />
2. Click on the application master link and click on Map/Reduce links<br />
3. More likely than not it is the reduce phase in the map reduce generated by hive that runs slow. Click on the reduce and then get the server on which the container has been running the longest.<br />
4. SSH in to the box and run the following multiple times in irregular time intervals.<br />
<br />
<div style="text-align: center;">
<i>killall -QUIT java </i></div>
<div style="text-align: left;">
<i><br /></i></div>
<div style="text-align: left;">
This will dump the threads to STDOUT on the logs. You can navigate to the logs on the box on the resource manager and look at the dumps. We realized that compression method for the ZLIB compression class was most frequently appearing in the dumps. Again, this is poor man's profiling and there are better ways to profile your java process. But it did work for us and we were able to get a performance increase after switching to LZO.</div>
</div>
Anonymoushttp://www.blogger.com/profile/07756830537136925187noreply@blogger.com2tag:blogger.com,1999:blog-2720443514548354607.post-25632078216638026822013-09-02T18:05:00.001-07:002013-09-02T18:16:33.390-07:00How to store encrypted passwords in a YAML file Ruby<div dir="ltr" style="text-align: left;" trbidi="on">
YAML works great for storing config information for ruby scripts. First, if you are writing ruby scripts I highly recommend storing your authentication information in an external file. Checking in auth info is a bad idea. If for whatever reason your source control system is compromised the attacker can have access to all your user names and password. <br />
<div>
<br /></div>
<div>
If you've tried to store any sort of non-UTF-8 password in YAML you know how painful it is to retrieve it. In fact I haven't found a way to retrieve it at all. YAML reads only UTF-8. </div>
<div>
<br />
<b>Encryption</b></div>
<div>
The first thing you need to make sure is that you are storing the password in the correct encoding. If you use gems like 'encryptor' check the encoding of the encrypted password first:<br />
encrypted_pass = Encryptor.encrypt(blah blah).<i><b>encoding</b></i><br />
<br />
If it is not UTF-8 you can force encoding by doing<br />
encrypted_pass = Encryptor.encrypt(blah blah).<b><i>force_encoding</i></b>('UTF-8')<br />
<br />
<div class="p1">
I like to Base64 encode and then UTF-8 encode the password out of shuber's encryptor gem. The gem by default spits out a ASCII-BIT string.</div>
<div class="p1">
<br /></div>
<div class="p1">
# To encrypt a fresh password to store in a file run:</div>
<div class="p1">
Base64.encode64(Encryptor.encrypt(passwd,:key => secret_key, :algorithm => 'aes-256-ecb')).force_encoding('UTF-8')</div>
<br />
<div class="p2">
<span class="s1"><br /></span></div>
<div class="p2">
<span class="s1">Copy and paste the above password in the YAML file.</span></div>
<div class="p2">
<span class="s1"><br /></span></div>
<div class="p2">
<span class="s1"><br /></span></div>
<div class="p2">
<span class="s1"><b>Decryption</b></span></div>
<div class="p2">
<span class="s1"># To decrypt the password from the YAML file</span></div>
<div class="p2">
<span class="s1">account_config = YAML.load_file(</span>name_of_file.yml<span class="s1">)</span></div>
<div class="p2">
</div>
<div class="p1">
account_password = Encryptor.decrypt(Base64.decode64(account_config[<span class="s1">'password'</span>].force_encoding(<span class="s1">'ASCII-8BIT'</span>)), :key => secret_key, :algorithm => <span class="s1">'aes-256-ecb'</span>)</div>
</div>
</div>
Anonymoushttp://www.blogger.com/profile/07756830537136925187noreply@blogger.com0tag:blogger.com,1999:blog-2720443514548354607.post-53986109739038194892013-09-02T17:49:00.002-07:002013-09-02T18:16:45.122-07:00Simple Cheat Sheet Search using Grep for your Mac<div dir="ltr" style="text-align: left;" trbidi="on">
Often I find myself referring to the same list of commands for utils such as git, hadoop, mysql, ruby, tmux, etc. The usual way of doing it is to just google them and you are very likely to find what you're looking for in a few clicks. A faster way is to keep a cheat sheet of commands you use often and then use a search on that text file when you are looking for commands.<br />
<br />
The problem with this approach is that it is just as painful as the first one. Imagine you have 8-10 tabs open at a time and your text editor tab is somewhere in the middle. That's 4 key strokes, Now search for the text thats one more. Now scroll to find the one that you like. Too many keystorkes/mouse events and its just real frustrating.<br />
<br />
I created a simple hack that just greps the line you want and shows 2 lines beneath the lines it finds a match on.<br />
<br />
dn.sh<br />
#!/bin/sh<br />
read search_string<br />
grep -n -i -A 2 $search_string ~/cheatsheet.txt<br />
<div>
<br /></div>
<br />
So for example if you have your cheatsheet which has command description followed by the command usage and which looks something like this:<br />
<br />
. . .<br />
Git Squash Commit (prefix every line to squash with s)<br />
git rebase -i master<br />
<br />
Git show remotes<br />
git remote show<br />
<br />
Git branches<br />
https://github.com/Kunena/Kunena-2.0/wiki/Create-a-new-branch-with-git-and-manage-branches<br />
<br />
Git Delete Branch<br />
git branch -D <branch name=""></branch><br />
. . .<br />
<br />
Then you can open spotlight with Command + Space, type dn, enter. Type your search string "squash" and you will see the following:<br />
315:Git <b>Squash</b> Commit (prefix every line to <b>squash</b> with s)<br />
316-git rebase -i master<br />
317-</div>
Anonymoushttp://www.blogger.com/profile/07756830537136925187noreply@blogger.com0tag:blogger.com,1999:blog-2720443514548354607.post-4053600241828910102013-09-02T17:17:00.000-07:002013-09-02T18:17:03.567-07:00Random JSON Payload generator in Ruby<div dir="ltr" style="text-align: left;" trbidi="on">
Here's a quick random JSON generator in Ruby for testing your code:<br />
<br />
# Get a random hash of strings<br />
def get_random_string(character_count)<br />
o = [('a'..'z'),('A'..'Z')].map{|i| i.to_a}.flatten<br />
<br />
(0...character_count).map{ o[rand(o.length)] }.join<br />
end<br />
<br />
# Get random json payload, input is size of each field and number of fields<br />
def get_random_json_payload(number_of_fields, field_size)<br />
fields = {}<br />
# Generate payload hash<br />
(1..number_of_fields).each do |field_number|<br />
fields['field_'+field_number.to_s] = get_random_string(field_size)<br />
end<br />
<br />
fields.to_json<br />
end<br />
<br />
<br />
<br />
<br />
<br />
puts get_random_json_payload(2,10)<br />
<br />
The output will be like<br />
{"field_1":"VUQZDpYRgA","field_2":"LtMQYvSZca"}<br />
<br />
<br />
<div>
<br /></div>
</div>
Anonymoushttp://www.blogger.com/profile/07756830537136925187noreply@blogger.com0tag:blogger.com,1999:blog-2720443514548354607.post-17994142785799581102013-09-02T17:14:00.001-07:002013-09-02T18:17:19.014-07:00How to set up a simple proxy server on Digital Ocean/ VPS<div dir="ltr" style="text-align: left;" trbidi="on">
SSH in to your digital ocean droplet and run the following<br />
<br />
sudo apt-get install tinyproxy #install tinyproxy<br />
which tinyproxy # make sure tinyproxy is installed<br />
ps -ef | grep proxy # make sure tinyproxy is running<br />
<br />
Now forward a port from your server/machine to the remote proxy on the droplet<br />
ssh -N root@dro.ple.tip.xxx -L local_port:localhost:remote_port<br />
e.g. ssh -N root@198.199.xxx.xxx -L 8000:localhost:8888<br />
<br />
Send a request to localhost at port 8000 and see if it appears in the tinyproxy logs. If you are using ruby to do crawling/scraping on websites you can set proxy on the agent:<br />
<br />
<span style="color: #444444; font-family: arial, sans-serif; font-size: x-small; line-height: 24.4375px;">agent.set_proxy localhost, 8000</span><span style="color: #444444; font-family: arial, sans-serif; font-size: x-small; line-height: 24.4375px;">agent.get("somewebsite")</span><br />
<span style="color: #444444; font-family: arial, sans-serif; font-size: x-small;"><span style="line-height: 24.4375px;"><br /></span></span>
<span style="color: #444444; font-family: arial, sans-serif; font-size: x-small;"><span style="line-height: 24.4375px;">See the results on your droplet appear in:</span></span><br />
<span style="background-color: white; color: #444444; font-family: arial, sans-serif; font-size: x-small; line-height: 24.44444465637207px;">/var/log/</span><span style="background-color: white; color: #444444; font-family: arial, sans-serif; font-size: x-small; font-weight: bold; line-height: 24.44444465637207px;">tinyproxy</span><span style="background-color: white; color: #444444; font-family: arial, sans-serif; font-size: x-small; line-height: 24.44444465637207px;">.</span><span style="background-color: white; color: #444444; font-family: arial, sans-serif; font-size: x-small; font-weight: bold; line-height: 24.44444465637207px;">log</span></div>
Anonymoushttp://www.blogger.com/profile/07756830537136925187noreply@blogger.com1tag:blogger.com,1999:blog-2720443514548354607.post-41823441364898512832013-09-02T16:52:00.004-07:002013-09-02T18:17:46.280-07:00Create a Simple HTTP Server on a directory to bypass same origin policy<div dir="ltr" style="text-align: left;" trbidi="on">
<br />
Let's say you are doing development of a website that uses Javascript on your laptop. Often you will notice that you will run in to issues where the browser does not allow you to access certain resource because of the <a href="http://en.wikipedia.org/wiki/Same-origin_policy" target="_blank">same origin policy</a>. Also, often at times you are working on a bunch of HTML files and want to host a few files over HTTP. <br />
<br />
You can do it quickly by creating a simple HTTP server that will open access to all files in the directory over HTTP. Here's the nifty python command that does that:<br />
<br />
python -m SimpleHTTPServer 8900<br />
<br />
Where 8900 is the port you want the content to be hosted upon.<br />
<br />
<br /></div>
Anonymoushttp://www.blogger.com/profile/07756830537136925187noreply@blogger.com0tag:blogger.com,1999:blog-2720443514548354607.post-62065185828504179932013-08-19T15:33:00.002-07:002013-09-02T18:18:21.972-07:00Find files and list their details<div dir="ltr" style="text-align: left;" trbidi="on">
One of the things I find myself doing often is find files in *nix systems and looking at their last modified.<br />
<br />
Finding the files is easy<br />
<br />
$ find / -name somefile.txt<br />
/etc/files/somefile.txt<br />
<br />
Now if you want to get details for the files such as last modified, etc.<br />
<br />
$ find / -name somefile.txt | xargs ls -la<br />
-rw-r--r-- 1 user group 1129 Nov 3 2011 /etc/files/somefile.txt</div>
Anonymoushttp://www.blogger.com/profile/07756830537136925187noreply@blogger.com0tag:blogger.com,1999:blog-2720443514548354607.post-73715381002201215432013-07-27T00:27:00.001-07:002013-07-27T00:27:53.316-07:00Leap Motion Experience sucks<div dir="ltr" style="text-align: left;" trbidi="on">
I hate the experience behind leap motion. From the beginning to the end - it has been crapulous. They are really really bad at running the company. Just today I got the controller in mail and couldn't wait to try it out. First, the controller wasn't being recognized by the inbuilt software and I had to disconnect the damn thing at least 10 times for it it actually recognize the device.<br />
<br />
After, recognizing I went through the orientation and it was ok except for the fact that it couldn't figure out the position of finger tips during certain hand gestures. Next step is to download some apps from Airspace which is Leap Motions app store. I completed the form for a new account and it took me back to the login screen without telling me if I registered successfully or not. I tried logging in with my new credentials and ehhh....can't. I did it multiple times and again the same thing.<br />
<br />
This is not the first time the company has disappointed me. The delay in shipment was very frustrating on top of that I had to re-enter my credit card information at least three times because every time I'd do it they'd send an email saying that it didn't go though correctly.<br />
<br />
Really pissing off. First of all, it is fine and dandy that you have all the slick demos and stuff but you really don't require any of that. Their first iteration should've been something simple and mainly for developers. The whole experience is real frustrating. Before buying I advise people to wait until the company gets its act together.<br />
<br />
Really shameful to see how poorly they've handled this.</div>
Anonymoushttp://www.blogger.com/profile/07756830537136925187noreply@blogger.com11tag:blogger.com,1999:blog-2720443514548354607.post-72043161324583580262013-07-10T23:17:00.003-07:002013-07-10T23:17:51.748-07:00Adding custom jar files for UDFs in Hive cli using hiverc<div dir="ltr" style="text-align: left;" trbidi="on">
To add custom functionality such as UDF Jars while using Hive CLI, just add your settings to the .hiverc file<br />
<br />
So let's say you are developing a UDF on your machine and want to test how it will behave in hive, simply put the below in your hiverc and it will get loaded whenever you start a new hive cli sesssion.<br />
<br />
add jar CustomUDFLib.jar;<br />
create temporary function function1 AS 'com.yourcompany.hive.udf.function1';<br />
create temporary function function2 AS 'com.yourcompany.hive.udf.function2';<br />
<br />
<br />
The hiverc file like all rc files must reside in your home directory.<br />
<br /></div>
Anonymoushttp://www.blogger.com/profile/07756830537136925187noreply@blogger.com0tag:blogger.com,1999:blog-2720443514548354607.post-67123719279844324592013-07-10T23:13:00.001-07:002013-07-10T23:13:06.119-07:00Hive Metastore Data Model<div dir="ltr" style="text-align: left;" trbidi="on">
The hive metastore data model in real interesting once you start to dig in to users, permissions, roles, etc.<br />
<br />
Here's a link to the ER diagram:<br />
<a href="https://issues.apache.org/jira/secure/attachment/12471108/HiveMetaStore.pdf">https://issues.apache.org/jira/secure/attachment/12471108/HiveMetaStore.pdf</a></div>
Anonymoushttp://www.blogger.com/profile/07756830537136925187noreply@blogger.com0tag:blogger.com,1999:blog-2720443514548354607.post-77842805679375449552013-07-10T23:09:00.001-07:002013-07-16T10:47:51.339-07:00Use Custom UDFs with Hue<div dir="ltr" style="text-align: left;" trbidi="on">
<div dir="ltr" style="text-align: left;" trbidi="on">
If you want to use custom UDFs with hue, here are the steps you need to follow:<br />
<br />
1. Generate the UDF jar<br />
2. Drop the jar in hadoop<br />
hadoop dfs -copyFromLocal udf.jar /path/to/file/in/hdfs<br />
3. In order for Hue to recognize the jar, you need to add the hive.aux.jars.path property to hive-site.xml file on the hue box<br />
<span style="background-color: white; color: #444444; font-family: arial, sans-serif; font-size: x-small; line-height: 16px;"><<b>property</b></span><span style="background-color: white; color: #444444; font-family: arial, sans-serif; font-size: x-small; line-height: 16px;">></span><br />
<span style="background-color: white; color: #444444; font-family: arial, sans-serif; font-size: x-small; line-height: 16px;"><</span><span style="background-color: white; color: #444444; font-family: arial, sans-serif; font-size: x-small; font-weight: bold; line-height: 16px;">name</span><span style="background-color: white; color: #444444; font-family: arial, sans-serif; font-size: x-small; line-height: 16px;">></span>hive.aux.jars.path<span style="background-color: white; color: #444444; font-family: arial, sans-serif; font-size: x-small; line-height: 16px;"></</span><span style="background-color: white; color: #444444; font-family: arial, sans-serif; font-size: x-small; font-weight: bold; line-height: 16px;">name</span><span style="background-color: white; color: #444444; font-family: arial, sans-serif; font-size: x-small; line-height: 16px;">> </span><br />
<span style="background-color: white; color: #444444; font-family: arial, sans-serif; font-size: x-small; line-height: 16px;"><<b>value</b></span><span style="background-color: white; color: #444444; font-family: arial, sans-serif; font-size: x-small; line-height: 16px;">></span><span style="background-color: white; color: #444444; font-family: arial, sans-serif; font-size: x-small; line-height: 16px;"><value>hdfs:///path/to/file/in/hdfs</value></span><span style="background-color: white; color: #444444; font-family: arial, sans-serif; font-size: x-small; line-height: 16px;"><<b>/value</b></span><span style="background-color: white; color: #444444; font-family: arial, sans-serif; font-size: x-small; line-height: 16px;">></span><br />
<span style="background-color: white; color: #444444; font-family: arial, sans-serif; font-size: x-small; line-height: 16px;"></<b>property</b></span><span style="background-color: white; color: #444444; font-family: arial, sans-serif; font-size: x-small; line-height: 16px;">></span><br />
<span style="color: #444444; font-family: arial, sans-serif; font-size: x-small;"><span style="line-height: 16px;"></span></span></div>
<span style="color: #444444; font-family: arial, sans-serif; font-size: x-small;"><span style="line-height: 16px;">4. Create the function definition using the Hue Query Editor (section on the left hand side to add jars/functions/etc)</span></span><br />
<span style="color: #444444; font-family: arial, sans-serif; font-size: x-small;"><span style="line-height: 16px;"><br /></span></span>
</div>
Anonymoushttp://www.blogger.com/profile/07756830537136925187noreply@blogger.com2tag:blogger.com,1999:blog-2720443514548354607.post-22250152643090633642013-07-10T22:05:00.003-07:002013-07-10T22:05:36.384-07:00Mysql difference between := and = in defining user variables<div dir="ltr" style="text-align: left;" trbidi="on">
In MySQL you will notice usage of := and = as an assignment operator for variables. But which one to use when? This always confused me.<br />
<br />
So the 411 is that basically you have to use := when you assign using a select clause for e.g.<br />
SELECT @var := some value<br />
SELECT @last_date := IFNULL(MAX(date_observed), '2013-05-01') FROM mart_dev_console_stats<br />
<div>
<br /></div>
<br />
When using the SET statement you have to use = . e.g.<br />
SET @var = some value<br />
<br />
Another thing to note is that in stored procedures a select @var := some value will actually return a result set to the client. If you are using the variable for further processing and don't really want to send it to client I suggest using<br />
<br />
SELECT something into @var<br />
FROM table<br />
<br />
See <a href="http://dev.mysql.com/doc/refman/5.0/en/select-into.html">http://dev.mysql.com/doc/refman/5.0/en/select-into.html</a> for more info<br />
<br /></div>
Anonymoushttp://www.blogger.com/profile/07756830537136925187noreply@blogger.com0tag:blogger.com,1999:blog-2720443514548354607.post-76247718922761277672013-07-02T00:14:00.001-07:002013-07-02T00:14:05.813-07:00Delete Jenkins Jobs using CURL<div dir="ltr" style="text-align: left;" trbidi="on">
It is real easy to automate the deletion of multiple jobs in jenkins. In Google Chrome open Developer Tools > Network. Now click on the delete job link on one of the jenkins jobs you want to delete.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiYPv227kWjaUvIEGORGi5eRm8qH_qgdIkZ5PIbolpoDWna8-gvRQL-GSh9PLlEpyWUwrnlsj8L4oRXOAnfEaoiNhswKyILhcjyjhx_05OhGFxuwnojiyjST7wPXHlfOzND2nW1XsFI-Mg/s874/Screen+Shot+2013-07-02+at+12.05.44+AM.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiYPv227kWjaUvIEGORGi5eRm8qH_qgdIkZ5PIbolpoDWna8-gvRQL-GSh9PLlEpyWUwrnlsj8L4oRXOAnfEaoiNhswKyILhcjyjhx_05OhGFxuwnojiyjST7wPXHlfOzND2nW1XsFI-Mg/s320/Screen+Shot+2013-07-02+at+12.05.44+AM.png" width="218" /></a></div>
<br />
Now inside the developer tools > network section you will see a post request being made to the jenkins server. Right click on the post request and choose copy as curl.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi4t2mFjuZkWRcFpRTlhLyNGv14f0RST7sFdsAZbVGSMyWjVVPCuwfLEIHcs30OL-a4LV15Bj7cEjtE0YQR3VDEaDr4etuHw16a4UJ7KSVl-i5m-mS2-jeoVoj_hDCf0Fj3LUW6GfkZ1zg/s1600/Screen+Shot+2013-07-02+at+12.10.07+AM.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="150" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi4t2mFjuZkWRcFpRTlhLyNGv14f0RST7sFdsAZbVGSMyWjVVPCuwfLEIHcs30OL-a4LV15Bj7cEjtE0YQR3VDEaDr4etuHw16a4UJ7KSVl-i5m-mS2-jeoVoj_hDCf0Fj3LUW6GfkZ1zg/s320/Screen+Shot+2013-07-02+at+12.10.07+AM.png" width="320" /></a></div>
<br />
Your string will look something like<br />
curl "http://servername:port/job/job_name/doDelete" -X POST -H "Cookie: JSESSIONID...<br />
JSESSIONID...<br />
JSESSIONID...<br />
..." -H "Referer: http://servername:port/job/job_name/" -H "Connection: keep-alive" -H "Content-Length: 0"<br />
<br />
Replace the job_name with whatever job name you want and run the curls as part of script. Voila! Fast Delete in Jenkins!</div>
Anonymoushttp://www.blogger.com/profile/07756830537136925187noreply@blogger.com2tag:blogger.com,1999:blog-2720443514548354607.post-39309576479557289512013-06-22T11:57:00.001-07:002013-08-14T11:47:35.255-07:00Upstart script for making hive as a service<div dir="ltr" style="text-align: left;" trbidi="on">
<span style="background-color: white; color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px;">Put this .conf file in your /etc/init folder. </span><br />
<br style="background-color: white; color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px;" />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif;"><span style="font-size: 13px; line-height: 18px;">description "Hive Server"</span></span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif;"><span style="font-size: 13px; line-height: 18px;"><br /></span></span>
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif;"><span style="font-size: 13px; line-height: 18px;">start on runlevel [2345]</span></span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif;"><span style="font-size: 13px; line-height: 18px;">stop on runlevel [016]</span></span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif;"><span style="font-size: 13px; line-height: 18px;"><br /></span></span>
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif;"><span style="font-size: 13px; line-height: 18px;">expect fork </span></span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif;"><span style="font-size: 13px; line-height: 18px;"><br /></span></span>
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif;"><span style="font-size: 13px; line-height: 18px;">script</span></span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif;"><span style="font-size: 13px; line-height: 18px;"> echo "Starting Hive Service"</span></span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif;"><span style="font-size: 13px; line-height: 18px;"> export HIVE_HOME=/usr/lib/hive-0.11.0-bin</span></span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif;"><span style="font-size: 13px; line-height: 18px;"> export HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce</span></span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif;"><span style="font-size: 13px; line-height: 18px;"> export HIVE_CONF_DIR=/etc/hive/conf/hiveserver1</span></span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif;"><span style="font-size: 13px; line-height: 18px;"><br /></span></span>
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif;"><span style="font-size: 13px; line-height: 18px;"> $HIVE_HOME/bin/hive --service hiveserver > /var/log/hive/hiveserver.out 2>&1 &</span></span><br />
<span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif;"><span style="font-size: 13px; line-height: 18px;">end script</span></span><br />
<span style="background-color: white; color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px;"><br /></span>
<span style="background-color: white; color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px;"><br /></span><span style="background-color: white; color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px;">To start/stop/check your service you can use</span><br />
<span style="background-color: white; color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px;">start hiveserver</span><br />
<span style="background-color: white; color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px;">stop hiveserver</span><br />
<span style="background-color: white; color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px;">status hiveserver</span></div>
Anonymoushttp://www.blogger.com/profile/07756830537136925187noreply@blogger.com0tag:blogger.com,1999:blog-2720443514548354607.post-83170251967395314832013-06-08T15:05:00.003-07:002013-06-08T15:05:27.910-07:00Write a simple Service using Upstart on Ubuntu<div dir="ltr" style="text-align: left;" trbidi="on">
There are two ways of writing a service on Ubuntu. One is by dropping a complex config file in init.d and the other is by using "upstart".<br />
<br />
Upstart makes it effortless to write services. All you have to do is put a configuration file in /etc/init and upstart takes care of the rest. You can then start and stop services by using:<br />
start test<br />
stop test<br />
<br />
To get the status of the service, you can do<br />
status test<br />
<br />
Here's a sample service that calls a shell script that prints strings on the console:<br />
<br />
################ test.conf<br />
description "Hive Server"<br />
author "Yash Ranadive<yash lookout.com="">"</yash><br />
<br />
start on runlevel [2345]<br />
stop on runlevel [016]<br />
<br />
respawn<br />
<br />
script<br />
/home/yranadive/print_delay.sh >> /tmp/print_delay.out 2>&1<br />
end script<br />
<br />
<br />
You need to know what kind of a program you are trying to run. For. e.g. if the program forks itself after running it or not. If the program is a service you will need to add expect daemon to the conf file.<br />
<br />
Here's the shell script print_delay.sh<br />
<br />
#!/bin/bash<br />
c=1<br />
while [ $c -le 5 ]<br />
do<br />
echo "Test line at $(date)"<br />
sleep 10<br />
(( c++ ))<br />
done<br />
<div>
<br /></div>
</div>
Anonymoushttp://www.blogger.com/profile/07756830537136925187noreply@blogger.com0tag:blogger.com,1999:blog-2720443514548354607.post-16156624309025245142013-05-22T00:06:00.002-07:002013-05-22T00:06:58.547-07:00Setting up Tmux on the mac<div dir="ltr" style="text-align: left;" trbidi="on">
Tmux is used to multiplex several consoles while you are logged in a linux box or on your mac terminal. It allows you to create split panes so you can see more stuff on the screen. You can check out the output of top on one pane, while you modify some code on another pane.<br />
<br />
Panes are easy to create and moving across panes is real easy. See the thoughtbot link below on how to use Tmux.<br />
<br />
Here are the links you need to set up tmux on your mac. Make sure you follow the thoughtbot tutorial to customized your ~/.tmux.conf file. If you are coming from the "Screen" world, you'd like to set your prefix key to Ctrl A.<br />
<br />
https://blogs.oracle.com/unixben/entry/install_tmux_on_mac_os<br />
<br />
http://robots.thoughtbot.com/post/2641409235/a-tmux-crash-course</div>
Anonymoushttp://www.blogger.com/profile/07756830537136925187noreply@blogger.com0