Install Hadoop on AWS Ubuntu Instance

15 October 2015

Step 1: Create an Ubuntu 14.04 LTS instance on AWS

Step 2: Connect to the instance

chmod 400 yourKey.pem

ssh-i yourKey.pem ubuntu@your_instance_ip

Step 3: Install Java

sudo add-apt-repository ppa:webupd8team/java

sudo apt-get update

sudo apt-get install oracle-java6-installer

sudo update-java-alternatives -s java-6-oracle

sudo apt-get install oracle-java6-set-default

Step 4: Add a Hadoop user

sudo addgroup hadoop

sudo adduser — ingroup hadoop hduser

Step 5: Create SSH key for password-free login

su — hduser

ssh-keygen -t rsa -P “”

cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

Step 6: Try connection

ssh localhost

exit

Step 7: Download and Install Hadoop

cd /usr/local

sudo wget http://apache.01link.hk/hadoop/core/hadoop-1.2.1/hadoop-1.2.1.tar.gz

sudo tar –xzvf hadoop-1.2.1.tar.gz

sudo mv hadoop-1.2.1 hadoop

chown –R hduser:hadoop hadoop

sudo rm hadoop-1.2.1.tar.gz

Step 8: Update .bashrc

su — hduser

vim $HOME/.bashrc

Add the following content to the end of the file:

export HADOOP_PREFIX=/usr/local/hadoop export JAVA_HOME=/usr/lib/jvm/java-6-sun unalias fs &> /dev/null alias fs=”hadoop fs” unalias hls &> /dev/null alias hls=”fs -ls” export PATH=$PATH:$HADOOP_PREFIX/bin

Then save it with :wq and execute .bashrc

source ~/.bashrc

Step 9: Configure Hadoop with logged in as hduser

cd /usr/local/hadoop/conf

vim hadoop-env.sh

Add the following lines to the file:

export JAVAHOME=/usr/lib/jvm/java-6-oracle export HADOOPCLASSPATH=/usr/local/hadoop

Save and Exit :wq

Step 10: Create a temporary directory for Hadoop

exit

sudo mkdir -p /app/hadoop/tmp

sudo chown hduser:hadoop /app/hadoop/tmp

sudo chmod 750 /app/hadoop/tmp

Step 11: Add snippets

su — hduser

cd /usr/local/hadoop/conf

vim core-site.xml

Put the following content in between < configuration > … configuration > tag

<property> <name>hadoop.tmp.dir</name> <value>/app/hadoop/tmp</value> <description>A base for other temporary directories.</description> </property> <property> <name>fs.default.name</name> <value>hdfs://localhost:54310</value> <description>The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem.</description> </property>

Save and exit :wq

Also edit file: vim mapred-site.xml

<property> <name>mapred.job.tracker</name> <value>localhost:54311</value> <description>The host and port that the MapReduce job tracker runs at. If "local", then jobs are run in-process as a single map and reduce task. </description> </property>

Save and exit :wq

And edit this file: vim hdfs-site.xml

<property> <name>dfs.replication</name> <value>1</value> <description>Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time. </description> </property>

Step 11: Format the HDFS

/usr/local/hadoop/bin/hadoop namenode -format

Step 12: Start Hadoop

/usr/local/hadoop/bin/start-all.sh

Step 13: To check if all the processes are up and running

jps

Step 14: To stop Hadoop by typing the following command:

/usr/local/hadoop/bin/stop-all.sh

Step 15: And start Hadoop again

/usr/local/hadoop/bin/start-all.sh

Now ready to rock! Have fun:)

Originally published at victorleungtw.com on October 15, 2015.