Saturday, December 1, 2012

Getting Started with Hadoop on Ubuntu

So, I'm trying to play with hadoop again (haven't done it in a while), and since ubuntu is my current weapon of choice, I found a great tutorial at http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/ ,but I wanted something even simpler, like a script, plus a sample program and instructions on how to compile and run it, so I created it (at the very bottom are the differences with Noll's tutorial). It is available at: https://github.com/okaram/scripts/blob/master/hadoop/install.sh .

You just need to download it and change it so it can be executable: You probably want to look at it in your favorite editor (it is not a good idea to just run a script from the internet; I trust myself, but you shouldn't trust me), and you may want to change the mirror while you're at it (I live in Atlanta, so I use Georgia Tech's). After you're happy, run it as root: And you should be done with the installation ! the script creates a user for hadoop, called hduser; you can change to it, by typing: Then, as that user, you want to setup your path and classpath (the classpath is needed for compiling): And start hadoop: Now download my sample program (it is the standard WordCount example, from the tutorial, but without the package statement, so you can compile it directly from that folder), compile it and create a jar file: Now, we need to put some data into hadoop; first we create a folder and copy a file into it (our same WordCount.java, since we just need a text file): And we copy that folder into hadoop (and list it, to verify it's there): And now we can run our program in hadoop: When you want to stop hadoop, just run the stop-all.sh command; also, if you want to copy the output to your file system, just use the -copyToLocal option of hadoop's dfs.
The install script is completely automated, so you can even use it to start an amazon ec2 instance with it; for example, use: to start a micro instance, with a ubuntu 12.04 daily build (for Dec-1-2012; change the ami id to get a different one :), and a key named mac-vm.

9 comments:

  1. it's a nice project, very helpful for us and thank's for sharing. we are providing Hadoop online training


    ReplyDelete
  2. The information which you provided is very much useful for Hadoop Online Training Learners thanks for sharing valuable information

    ReplyDelete
  3. This comment has been removed by a blog administrator.

    ReplyDelete
  4. Really good piece of knowledge, I had come back to understand regarding your website from my friend Sumit, Hyderabad And it is very useful for who is looking for HADOOP.

    ReplyDelete
  5. Excellent piece of knowledge, I had come back to read concerning your web site from my friend shiva, bangalore. I have readed atleast eight posts of your website and let me tell you, your website provides the most fascinating information. This is the knowledge that I had been craving for, I am already your rss reader currently and that I would frequently be careful for the new posts. Thanks plenty another time, Regards,Hadoop Online Training

    ReplyDelete
  6. The Hadoop platform was designed to solve problems where you have a lot of data, mixture of complex and structured data.
    Hadoop Development

    ReplyDelete
  7. Thank you provide valuable informations and iam seacrching same informations,and saved my time SAS Online Training

    ReplyDelete
  8. Nice blog and thanks for sharing your information. Learn communals provide the high-quality online courses training to the students like android, HADOOP, java, SAP, Tableau, Hibernate, Struts, Spring, Salesforce etc.
    HADOOP Online Training Ameerpet

    ReplyDelete
  9. I like the helpful hadoop information you provide for your tutorials. I’ll bookmark your weblog and check again here frequently. I am quite sure I’ll learn many new stuff proper here! Best of luck for the following!
    Hadoop Training in hyderabad

    ReplyDelete