- step 4 - Install and configure Hadoop is the last step for creating the single node of hadoop
. We took a look at installation of Linux based OS (Ubuntu) for Hadoop as we opted for Linux instead of Windows for Hadoop. We also saw the reasons for the preference. We installed and Configured our chosed Java - Oracle Java.
Login as HDUser. Download Hadoop - 2.x tar file from the any mirror
here
Uncompress the Hadoop tar gz file and move it to /usr/local. We will also change owner. Use Terminal for all commands
cd Downloads
sudo tar vxzf hadoop-2.2.0.tar.gz -C /usr/local
cd /usr/local
sudo mv hadoop-2.2.0 hadoop
sudo chown -R hduser:hadoop hadoop
Update the HDUser's .bashrc file
cd ~
gksudo gedit .bashrc
Update the file at the end with below text. Use jdk folder name same as actual folder - something like "jdk-7-i386" (check in /usr/lib/jvm)
#Hadoop variables
export JAVA_HOME=/usr/lib/jvm/jdk/
export HADOOP_INSTALL=/usr/local/hadoop
export PATH=$PATH:$HADOOP_INSTALL/bin
export PATH=$PATH:$HADOOP_INSTALL/sbin
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_HOME=$HADOOP_INSTALL
export HADOOP_HDFS_HOME=$HADOOP_INSTALL
export YARN_HOME=$HADOOP_INSTALL
#end of update
Save and close
Now open hadoop-env.sh for updating Java Home (JAVA_HOME)
gksudo gedit /usr/local/hadoop/etc/hadoop/hadoop-env.sh
export JAVA_HOME=/usr/lib/jvm/jdk/
Save and close. Reboot the system and login with HDUser again.
Now, verify Hadoop installation for terminal
hadoop version
This should give something like below
Hadoop 2.2.0
Subversion https://svn.apache.org/repos/asf/hadoop/common -r 1529768
Compiled by hortonmu on 2013-10-07T06:28Z
Compiled with protoc 2.5.0
From source with checksum 79e53ce7994d1628b240f09af91e1af4
This command was run using /usr/local/hadoop/share/hadoop/common/hadoop-common-2.2.0.jar
If you get it, Congratulations!!! Hadoop is now successfully installed. If not, put me a comment on contact page
Now we configure it by updating its xml files
Open core-site.xml and add the given text between
<configuration> </configuration> tags
gksudo gedit /usr/local/hadoop/etc/hadoop/core-site.xml
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
Save and Close the file
Open yarn-site.xml and add the given text between
<configuration> </configuration> tags
gksudo gedit /usr/local/hadoop/etc/hadoop/yarn-site.xml
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
Save and Close the file
Open mapred-site.xml.template and add the given text between
<configuration> </configuration> tags
gksudo gedit /usr/local/hadoop/etc/hadoop/mapred-site.xml.template
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
Save the file
as mapred-site.xml in /usr/local/hadoop/etc/hadoop/ directory and Close the file
Lets now create Name Node and Data Node through terminal
cd ~
mkdir -p mydata/hdfs/namenode
mkdir -p mydata/hdfs/datanode
Now, update hdfs-site.xml and add the given text between
<configuration> </configuration> tags
gksudo gedit /usr/local/hadoop/etc/hadoop/hdfs-site.xml
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/hduser/mydata/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/hduser/mydata/hdfs/datanode</value>
</property>
Next, we will format the hdfs for our first use of Hadoop and start the Hadoop Services
hdfs namenode -format
start-dfs.sh
start-yarn.sh
Verify the Hadoop nodes running by
jps
The below should appear in output
2970 ResourceManager
3461 Jps
3177 NodeManager
2361 NameNode
2840 SecondaryNameNode
This completes the set up steps for Hadoop.
You can also learn about
usage of Hadoop and about
Hadoop architecture on
BabaGyan.com.