googleads
How to Setup Hadoop 2.8.0 (Single Node Cluster) on CentOS
preloder
DevOps Technical

How to Setup Hadoop 2.8.0 (Single Node Cluster) on CentOS

Introduction

Apache Hadoop 2.8.0 is a minor release in the 2.x.y release line, building upon the previous stable release 2.7.3.

The following are the features and improvements that are said to be available in Apache Hadoop 2.8.0

  • Common
    • Support async call retry and failover which can be used in async DFS implementation with retry effort.
    • Cross Frame Scripting (XFS) prevention for UIs can be provided through a common servlet filter.
    • S3A improvements: add ability to plug in any AWSCredentialsProvider, support read s3a credentials from Hadoop credential provider API in addition to XML configuration files, support Amazon STS temporary credentials
    • WASB improvements: adding append API support
    • Build enhancements: replace dev-support with wrappers to Yetus, provide a docker based solution to setup a build environment, remove CHANGES.txt and rework the change log and release notes.
    • Add posixGroups support for LDAP groups mapping service.
    • Support integration with Azure Data Lake (ADL) as an alternative Hadoop-compatible file system.
  • HDFS
    • WebHDFS enhancements: integrate CSRF prevention filter in WebHDFS, support OAuth2 in WebHDFS, disallow/allow snapshots via WebHDFS
    • Allow long-running Balancer to log in with keytab
    • Add ReverseXML processor which reconstructs an fsimage from an XML file. This will make it easy to create fsimages for testing, and manually edit fsimages when there is corruption
    • Support nested encryption zones
    • DataNode Lifeline Protocol: an alternative protocol for reporting DataNode liveness. This can prevent the NameNode from incorrectly marking DataNodes as stale or dead in highly overloaded clusters where heartbeat processing is suffering delays.
    • Logging HDFS operation’s caller context into audit logs
    • A new datanode command for evicting writers which is useful when data node decommissioning is blocked by slow writers.
  • YARN
    • NodeManager CPU resource monitoring in Windows.
    • NM shut down more graceful: NM will unregister to RM immediately rather than waiting for the timeout to be LOST (if NM work preserving is not enabled).
    • Add ability to fail a specific AM attempt in the scenario of AM attempt gets stuck.
    • CallerContext support in YARN audit log.
    • ATS versioning support: a new configuration to indicate timeline service version.
  • MAPREDUCE
    • Allow node labels get specified in submitting MR jobs
    • Add a new tool to combine aggregated logs into HAR file

       Reference: hadoop.apache.org

This blog will help you to install Hadoop 2.8.0 on CentOS operating system and this includes basic configuration required to start working with Hadoop. I have explained the entire process in simple and easy steps.

Step 1 – Installing Java

Java is required for running Hadoop on any system, So before installing hadoop make sure java is installed on your system

 

If Java is not installed in the system then install it by using the following commands. To Install Java OpenJDK 8

 

After installing Java configure Java Environment Variables /etc/profile.d/java.sh

export JAVA_HOME=/usr/lib/jvm/java-openjdk

export JAVA_PATH=$JAVA_HOME

export PATH=$PATH:$JAVA_HOME/bin

Step 2 – Setup Hadoop user account

It is recommended to create non-root user account for hadoop environment

 

Setup key based ssh to its own account

 

Let’s check key based login and exit from Hadoop

 

Step 3 – Download Hadoop source file

Download Hadoop 2.8.0 source file, For different version, refer http://hadoop.apache.org

 

Step 4 – Configure Hadoop Pseudo-Distributed Mode

  1. Setup Environment Variables

Edit ~/.bashrc file and append following values at end of file.

export HADOOP_HOME=/usr/local/hadoop

export HADOOP_INSTALL=$HADOOP_HOME

export HADOOP_MAPRED_HOME=$HADOOP_HOME

export HADOOP_COMMON_HOME=$HADOOP_HOME

export HADOOP_HDFS_HOME=$HADOOP_HOME

export YARN_HOME=$HADOOP_HOME

export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native

export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin

Now apply the changes in current running environment

 

Edit $HADOOP_HOME/etc/hadoop/hadoop-env.sh and set JAVA_HOME

# Change Java home path as per java installed on your system

export JAVA_HOME=/usr/lib/jvm/java-openjdk

  1. Edit Configuration Files

Hadoop contains many configuration files, which need to be configured as per requirements of your hadoop environment.

 

  1. i) Edit core-site.xml
 

  1. ii) Edit hdfs-site.xml
 

iii) Edit mapred-site.xml

 

  1. iv) Edit yarn-site.xml
 

  1. Format Hadoop Namenode

Once hadoop single node cluster setup has done, it’s time to initialize HDFS file system by formatting

 

Sample output:

 

Step 5 – Start Hadoop Cluster

Let’s start your Hadoop cluster using the scripts provides by hadoop. Just navigate to your Hadoop sbin directory and execute scripts one by one.

 

Run start-dfs.sh to start namenode, datanode and secondary namenodes

 

Sample output:

 

Sample output:

 

Sample output:

 

Step 6 – Check Hadoop Services

Access 50070 for getting information about NameNode

http://HOST_NAME:50070/

Access 8088 for getting information about cluster

http://HOST_NAME:8088/

Access 50090 for getting information about secondary namenode.

http://HOST_NAME:50090/

Access 50075 for getting information about DataNode

http://HOST_NAME:50075/

Step 7 – Test Hadoop Setup

  1. i) Make the HDFS directories

$ bin/hdfs dfs -mkdir /user

$ bin/hdfs dfs -mkdir /user/hadoop

Manage Hadoop Services

To start all hadoop instances run the below commands

 

To stop all hadoop instances run the below commands

 

Hope this article helped you to easily setup Hadoop 2.8.0 (Single Node Cluster) on CentOS. If you have any doubts or queries please comment below. For updates follow agiratechnologies.

The following two tabs change content below.

Saravana

An enthusiastic Tech Lead with 7 plus years of experience in Web development arena. Owns legitimate experience in Ruby, Ruby On Rails, AngularJs, DevOps. Golang, Another add on, This young tech freak never miss a chance to get his hands on planting and Gardening even in his busy weekends.

Leave a Reply

Your email address will not be published. Required fields are marked *