My Hadoop Gotchas

Posted: November 3rd, 2009 | Author: ryan | Filed under: Hadoop, Programming, Tips and Tricks | View Comments

Hadoop and I have had our ups and downs lately. I have been accumulating personal notes about certain issues I’ve had and their solutions. This is a record of the things that have bitten me.



If your streaming job has a lot of dependencies that you want to ship along in your jar you might find it useful to jar them as well and throw a -file my_deps.jar at hadoop. The gotcha is this: hadoop won’t automatically unjarify for you. To get around this simply wrap your mapper/reducer with an other script to unjar and then execute your original mapper/reducer.

#!/bin/bash

tar -xzvf my_deps.tgz .
python ./mapper.py
# You've probably already escalated the permissions to +x
# since it were required previously by Hadoop,
# but now we can be more explicit.
# Just don't forget to do the same on mapper_wrapper.sh.



I was having a lot of issue with HDFS. I couldn’t issue a bin/hadoop dfs -mkdir, -put, or -copyFromLocal without all kinds of connection issues or cryptic java errors.

After copious amounts of frustration I finally seemed to fix the issue, or at least find a work-around.

Warning: the following commands will destroy your data in HDFS.

> bin/stop-dfs.sh
> rm -rf hadoop-ryan/*
> bin/hadoop namenode -format
> bin/start-dfs.sh

Note that hadoop-ryan/* is my hadoop.tmp.dir.

<property>
<name>hadoop.tmp.dir</name>
<value>/Users/ryan/Dev/hadoop/hadoop-${user.name}</value>
<description>A base for other temporary directories.</description>
</property>

A Python Generator to Generate Sublists of Increasing Length

Posted: July 17th, 2009 | Author: ryan | Filed under: Python, Tips and Tricks | View Comments

Here’s a nifty generator to take a list, l, of length n and get each of n sublists, l[0:i], where i ranges from 1 to n+1.

1
2
3
4
5
6
7
8
9
10
def gen_sub_lists(l):
	"""Generate sublists of a list.
 
	Sublists of [0,1,2,...,n] are
	[0], [0,1], [0,1,2], . . . , [0,1,2,...,n].
	"""
 
	lengths = range(len(l))
	for end in lengths:
		yield l[0:end+1]


Skype + Google = Free Calls

Posted: July 16th, 2009 | Author: ryan | Filed under: Tips and Tricks | View Comments

If you didn’t know, you can make calls using Skype to 800 numbers for free.

Also, if you didn’t know, Google offers a free directory service called Goog-411. By calling 1-800-GOOG-411, speaking the city and state and the business you can have Google connect you for free.

Do you see where I am going with this?

Just call 1-800-GOOG-411 using Skype, tell Google what you want and you’re connected.

I find this especially useful when I am working in my office in the basement at school because cell reception is terrible.