Saturday, December 1, 2012

Getting Started with Hadoop on Ubuntu

So, I'm trying to play with hadoop again (haven't done it in a while), and since ubuntu is my current weapon of choice, I found a great tutorial at ,but I wanted something even simpler, like a script, plus a sample program and instructions on how to compile and run it, so I created it (at the very bottom are the differences with Noll's tutorial). It is available at: .

You just need to download it and change it so it can be executable: You probably want to look at it in your favorite editor (it is not a good idea to just run a script from the internet; I trust myself, but you shouldn't trust me), and you may want to change the mirror while you're at it (I live in Atlanta, so I use Georgia Tech's). After you're happy, run it as root: And you should be done with the installation ! the script creates a user for hadoop, called hduser; you can change to it, by typing: Then, as that user, you want to setup your path and classpath (the classpath is needed for compiling): And start hadoop: Now download my sample program (it is the standard WordCount example, from the tutorial, but without the package statement, so you can compile it directly from that folder), compile it and create a jar file: Now, we need to put some data into hadoop; first we create a folder and copy a file into it (our same, since we just need a text file): And we copy that folder into hadoop (and list it, to verify it's there): And now we can run our program in hadoop: When you want to stop hadoop, just run the command; also, if you want to copy the output to your file system, just use the -copyToLocal option of hadoop's dfs.
The install script is completely automated, so you can even use it to start an amazon ec2 instance with it; for example, use: to start a micro instance, with a ubuntu 12.04 daily build (for Dec-1-2012; change the ami id to get a different one :), and a key named mac-vm.


  1. it's a nice project, very helpful for us and thank's for sharing. we are providing Hadoop online training

  2. This comment has been removed by a blog administrator.

  3. Its really an Excellent post. I just stumbled upon your blog and wanted to say that I have really enjoyed reading your blog. Thanks for sharing....
    Best Devops Training in pune
    Devops interview questions and answers

  4. Awesome..You have clearly explained …Its very useful for me to know about new things..Keep on blogging..
    python course in pune
    python course in chennai
    python Training in Bangalore

  5. Whoa! I’m enjoying the template/theme of this website. It’s simple, yet effective. A lot of times it’s very hard to get that “perfect balance” between superb usability and visual appeal. I must say you’ve done a very good job with this.
    AWS Training in Bangalore |Best AWS Training Institute in Bangalore BTM, Marathahalli
    AWS Training in Chennai | AWS Training Institute in Chennai Velachery, Tambaram, OMR