HDFS NameNode:
i)The NameNode daemon must be running at all the times
Note:
– If the NameNode stops, the cluster becomes inaccessible
– HDFS administrator will take care to ensure that the NameNode hardware is reliable!
ii)The NameNode holds all of its metadata in RAM for fast access
– It keeps a record of changes on disk for crash recovery
iii)A separate daemon known as the Secondary NameNode takes care of some housekeeping tasks for the NameNode
– Be careful: The Secondary NameNode is not a backup NameNode!
Key Enhaced implementations of NameNode daemon:
*NameNode High Availability in CDH4(Cloudera's Distribution, including Apache Hadoop)
i)CDH4 introduced High Availability for the NameNode, Instead of a single NameNode, there are now two
– An Active NameNode
– A Standby NameNode
ii)If the Active NameNode fails, the Standby NameNode can automatically take over
iii)The Standby NameNode does the work performed by the Secondary NameNode in ‘classic’ HDFS
– HA HDFS does not run a Secondary NameNode daemon
Note:HDFS administrator will choose whether to set the cluster up with
NameNode High Availability or not
Points To Note on HDFS :
*Although files are split into 64MB or 128MB blocks, if a file is smaller than this the full 64MB/128MB will not be used
*Without the metadata on the NameNode, there is no way to access the files in the HDFS cluster
Client application to read a HDFS file:
i)It communicates with the NameNode to determine which blocks make up the file, and which DataNodes those blocks reside on
ii)It then communicates directly with the DataNodes to read the data
Accessing HDFS
Access to HDFS from the command line is achieved with the hadoop fs command
hadoop fs Examples
1)Create a directory called input under the user’s home directory
hadoop fs -mkdir Mano
2)Copy file foo.txt from local disk to the user’s directory in HDFS
hadoop fs -put /home/hadoop/Mano/sample.txt Mano
3)Get a directory lis/ng of the user’s home directory in HDFS
hadoop fs -ls
4)Get a directory lis/ng of the HDFS root directory
hadoop fs -ls /
5)Display the contents of the HDFS file Mano/sample.txt
hadoop fs -cat Mano/sample.txt
6)Move that file to the local disk, named as from_HDFS.txt
hadoop fs -get Mano/sample.txt /home/hadoop/from_HDFS.txt
Note: copyFromLocal is a synonym for put; copyToLocal is a synonym for get
7)Delete the directory input_old and all its contents
hadoop fs -rm -r Mano
i)The NameNode daemon must be running at all the times
Note:
– If the NameNode stops, the cluster becomes inaccessible
– HDFS administrator will take care to ensure that the NameNode hardware is reliable!
ii)The NameNode holds all of its metadata in RAM for fast access
– It keeps a record of changes on disk for crash recovery
iii)A separate daemon known as the Secondary NameNode takes care of some housekeeping tasks for the NameNode
– Be careful: The Secondary NameNode is not a backup NameNode!
Key Enhaced implementations of NameNode daemon:
*NameNode High Availability in CDH4(Cloudera's Distribution, including Apache Hadoop)
i)CDH4 introduced High Availability for the NameNode, Instead of a single NameNode, there are now two
– An Active NameNode
– A Standby NameNode
ii)If the Active NameNode fails, the Standby NameNode can automatically take over
iii)The Standby NameNode does the work performed by the Secondary NameNode in ‘classic’ HDFS
– HA HDFS does not run a Secondary NameNode daemon
Note:HDFS administrator will choose whether to set the cluster up with
NameNode High Availability or not
Points To Note on HDFS :
*Although files are split into 64MB or 128MB blocks, if a file is smaller than this the full 64MB/128MB will not be used
*Without the metadata on the NameNode, there is no way to access the files in the HDFS cluster
Client application to read a HDFS file:
i)It communicates with the NameNode to determine which blocks make up the file, and which DataNodes those blocks reside on
ii)It then communicates directly with the DataNodes to read the data
Accessing HDFS
Access to HDFS from the command line is achieved with the hadoop fs command
hadoop fs Examples
1)Create a directory called input under the user’s home directory
hadoop fs -mkdir Mano
2)Copy file foo.txt from local disk to the user’s directory in HDFS
hadoop fs -put /home/hadoop/Mano/sample.txt Mano
3)Get a directory lis/ng of the user’s home directory in HDFS
hadoop fs -ls
4)Get a directory lis/ng of the HDFS root directory
hadoop fs -ls /
5)Display the contents of the HDFS file Mano/sample.txt
hadoop fs -cat Mano/sample.txt
6)Move that file to the local disk, named as from_HDFS.txt
hadoop fs -get Mano/sample.txt /home/hadoop/from_HDFS.txt
Note: copyFromLocal is a synonym for put; copyToLocal is a synonym for get
7)Delete the directory input_old and all its contents
hadoop fs -rm -r Mano
No comments:
Post a Comment