Tuesday 1 August 2017

3)More on HDFS NameNode

HDFS NameNode:
i)The NameNode daemon must be running at all the times

Note:
             – If the NameNode stops, the cluster becomes inaccessible
             – HDFS administrator will take care to ensure that the NameNode hardware is reliable!

ii)The NameNode holds all of its metadata in RAM for fast access
            – It keeps a record of changes on disk for crash recovery

iii)A separate daemon known as the Secondary NameNode takes care of some housekeeping tasks for the NameNode
          – Be careful: The Secondary NameNode is not a backup NameNode!

Key Enhaced implementations of NameNode daemon:
*NameNode High Availability in CDH4(Cloudera's Distribution, including Apache Hadoop)
  i)CDH4 introduced High Availability for the NameNode, Instead of a single NameNode, there are now two
       – An Active NameNode 
       – A Standby NameNode 

ii)If the Active NameNode fails, the Standby NameNode can automatically take over

iii)The Standby NameNode does the work performed by the Secondary NameNode in ‘classic’ HDFS
         – HA HDFS does not run a Secondary NameNode daemon

Note:HDFS administrator will choose whether to set the cluster up with

NameNode High Availability or not

Points To Note on HDFS :
*Although files are split into 64MB or 128MB blocks, if a file is smaller than this the full 64MB/128MB will not be used

*Without the metadata on the NameNode, there is no way to access the files in the HDFS cluster

Client application  to read a HDFS file:
i)It communicates with the NameNode to determine which blocks make up the file, and which DataNodes those blocks reside on

ii)It then communicates directly with the DataNodes to read the data

Accessing HDFS
Access to HDFS from the command line is achieved with the hadoop fs command

hadoop fs Examples
1)Create a directory called input under the user’s home directory
hadoop fs -mkdir Mano

2)Copy file foo.txt from local disk to the user’s directory in HDFS
hadoop fs -put /home/hadoop/Mano/sample.txt Mano




3)Get a directory lis/ng of the user’s home directory in HDFS
hadoop fs -ls

4)Get a directory lis/ng of the HDFS root directory
hadoop fs -ls /

5)Display the contents of the HDFS file Mano/sample.txt

hadoop fs -cat Mano/sample.txt

6)Move that file to the local disk, named as from_HDFS.txt

 hadoop fs -get Mano/sample.txt /home/hadoop/from_HDFS.txt

 Note: copyFromLocal is a synonym for put; copyToLocal is a synonym for get

7)Delete the directory input_old and all its contents

hadoop fs -rm -r Mano

No comments:

Post a Comment

Fundamentals of Python programming

Fundamentals of Python programming: Following below are the fundamental constructs of Python programming: Python Data types Python...