Install Hadoop on AWS Ubuntu Instance
15 October 2015
Step 1: Create an Ubuntu 14.04 LTS instance on AWS
Step 2: Connect to the instance
chmod 400 yourKey.pem
ssh-i yourKey.pem ubuntu@your_instance_ip
Step 3: Install Java
sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java6-installer
sudo update-java-alternatives -s java-6-oracle
sudo apt-get install oracle-java6-set-default
Step 4: Add a Hadoop user
sudo addgroup hadoop
sudo adduser — ingroup hadoop hduser
Step 5: Create SSH key for password-free login
su — hduser
ssh-keygen -t rsa -P “”
cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
Step 6: Try connection
ssh localhost
exit
Step 7: Download and Install Hadoop
cd /usr/local
sudo wget http://apache.01link.hk/hadoop/core/hadoop-1.2.1/hadoop-1.2.1.tar.gz
sudo tar –xzvf hadoop-1.2.1.tar.gz
sudo mv hadoop-1.2.1 hadoop
chown –R hduser:hadoop hadoop
sudo rm hadoop-1.2.1.tar.gz
Step 8: Update .bashrc
su — hduser
vim $HOME/.bashrc
Add the following content to the end of the file:
export HADOOP_PREFIX=/usr/local/hadoop export JAVA_HOME=/usr/lib/jvm/java-6-sun unalias fs &> /dev/null alias fs=”hadoop fs” unalias hls &> /dev/null alias hls=”fs -ls” export PATH=$PATH:$HADOOP_PREFIX/bin
Then save it with :wq and execute .bashrc
source ~/.bashrc
Step 9: Configure Hadoop with logged in as hduser
cd /usr/local/hadoop/conf
vim hadoop-env.sh
Add the following lines to the file:
export JAVAHOME=/usr/lib/jvm/java-6-oracle export HADOOPCLASSPATH=/usr/local/hadoop
Save and Exit :wq
Step 10: Create a temporary directory for Hadoop
exit
sudo mkdir -p /app/hadoop/tmp
sudo chown hduser:hadoop /app/hadoop/tmp
sudo chmod 750 /app/hadoop/tmp
Step 11: Add snippets
su — hduser
cd /usr/local/hadoop/conf
vim core-site.xml
Put the following content in between < configuration > … configuration > tag
<property> <name>hadoop.tmp.dir</name> <value>/app/hadoop/tmp</value> <description>A base for other temporary directories.</description> </property> <property> <name>fs.default.name</name> <value>hdfs://localhost:54310</value> <description>The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem.</description> </property>
Save and exit :wq
Also edit file: vim mapred-site.xml
<property> <name>mapred.job.tracker</name> <value>localhost:54311</value> <description>The host and port that the MapReduce job tracker runs at. If "local", then jobs are run in-process as a single map and reduce task. </description> </property>
Save and exit :wq
And edit this file: vim hdfs-site.xml
<property> <name>dfs.replication</name> <value>1</value> <description>Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time. </description> </property>
Step 11: Format the HDFS
/usr/local/hadoop/bin/hadoop namenode -format
Step 12: Start Hadoop
/usr/local/hadoop/bin/start-all.sh
Step 13: To check if all the processes are up and running
jps
Step 14: To stop Hadoop by typing the following command:
/usr/local/hadoop/bin/stop-all.sh
Step 15: And start Hadoop again
/usr/local/hadoop/bin/start-all.sh
Now ready to rock! Have fun:)
Originally published at victorleungtw.com on October 15, 2015.