Hadoop Common Commands:
Hadoop help usage:
command:
mano@Mano:~$ hadoop -help
1)archive:
Hadoop Archives:
Archives are to the problem with small files in Hadoop:
Hadoop works best with big files and small files are handled inefficiently in HDFS. As we know, Namenode holds the metadata information in memory for all the files stored in HDFS. Let’s say we have a file in HDFS which is 1 GB in size and the Namenode will store metadata information of the file – like file name, creator, created time stamp, blocks, permissions etc.
Now assume we decide to split this 1 GB file in to 1000 pieces and store all 1000 “small” files in HDFS. Now Namenode has to store metadata information of 1000 small files in memory. This is not very efficient – first it takes up a lot of memory and second soon Namenode will become a bottleneck as it is trying to manage a lot of data.
What is HAR?
Hadoop Archives or HAR is an archiving facility that packs files in to HDFS blocks efficiently and hence HAR can be used to tackle the small files problem in Hadoop. A Hadoop archive always has a *.har extension.
Note:
HAR is created from a collection of files and the archiving tool (a simple command) will run a MapReduce job to process the input files in parallel and create an archive file.
How to create HAR file:
Syntax:
Usage: hadoop archive -archiveName name -p <parent> [-r <replication factor>] <src>* <dest>
Command:
mano@Mano:~$ hadoop archive -archiveName Mano.har -p /MANO/Sqoop_import_table/employees /HAR
How to list HAR file :
Command:
mano@Mano:~$ hadoop fs -ls /HAR/Mano.har
Found 4 items
-rw-r--r-- 1 mano supergroup 0 2017-09-02 16:44 /HAR/Mano.har/_SUCCESS
-rw-r--r-- 5 mano supergroup 446 2017-09-02 16:44 /HAR/Mano.har/_index
-rw-r--r-- 5 mano supergroup 22 2017-09-02 16:44 /HAR/Mano.har/_masterindex
-rw-r--r-- 1 mano supergroup 65 2017-09-02 16:43 /HAR/Mano.har/part-0
A Hadoop archive directory contains metadata (in the form of _index and _masterindex) and data (part-*) files. The _index file contains the name of the files that are part of the archive and the location within the part files.
2)checknative
This command checks the availability of the Hadoop native code
Syntax:
Usage: hadoop checknative [-a] [-h]
-a- To Check all libraries are available.
-h - prints help message
commands:
mano@Mano:~$ hadoop checknative -a
mano@Mano:~$ hadoop checknative -h
3)classpath
prints the class path needed to get the Hadoop jar and the required libraries
Syntax:
Usage: hadoop classpath [--glob |--jar <path> |-h |--help]
--glob argument helps - to view the class path without wildcards.
Command:
mano@Mano:~$ hadoop classpath
mano@Mano:~$ hadoop classpath --glob
4)distch
Change the ownership and permissions on many files at once.
Syntax:
Usage: hadoop distch [-f urilist_url] [-i] [-log logdir] path:owner:group:permissions
5)distcp - distributed copy
Distributed copy file or directories recursively.
distcp is a general utility for copying large data sets between distributed file systems within and across clusters. The distcp command submits a regular MapReduce job that performs a file-by-file copy.
Example:
Copy a directory from one node in the cluster to another
hadoop fs -distcp hdfs://namenodeA/apache_hadoop hdfs://namenodeB/hadoop
6)fs
It is a synonym for hdfs dfs when HDFS is in use.
7)jar
Runs a jar file.
Syntax:
Usage: hadoop jar <jar> [mainClass] args...
8)version
Prints the hadoop version
Command:
mano@Mano:~$ hadoop version
9)CLASSNAME
Runs the class named CLASSNAME. The class must be part of a package.
Syntax:
Usage: hadoop CLASSNAME
10)dfsadmin -safemode
Basically safemode makes only the Read access
Commands:
1)To get help/to see available options
hadoop@Mano:~$ hadoop dfsadmin -safemode
2) To enter into safemode
hadoop@Mano:~$ hadoop dfsadmin -safemode enter
3)To leave safemode
hadoop@Mano:~$ hadoop dfsadmin -safemode leave
4)To list safemode
hadoop@Mano:~$ hadoop dfsadmin -safemode get
5)To wait and list safemode
hadoop@Mano:~$ hadoop dfsadmin -safemode wait
Previous Page: Next Page
Hadoop help usage:
command:
mano@Mano:~$ hadoop -help
1)archive:
Hadoop Archives:
Archives are to the problem with small files in Hadoop:
Hadoop works best with big files and small files are handled inefficiently in HDFS. As we know, Namenode holds the metadata information in memory for all the files stored in HDFS. Let’s say we have a file in HDFS which is 1 GB in size and the Namenode will store metadata information of the file – like file name, creator, created time stamp, blocks, permissions etc.
Now assume we decide to split this 1 GB file in to 1000 pieces and store all 1000 “small” files in HDFS. Now Namenode has to store metadata information of 1000 small files in memory. This is not very efficient – first it takes up a lot of memory and second soon Namenode will become a bottleneck as it is trying to manage a lot of data.
What is HAR?
Hadoop Archives or HAR is an archiving facility that packs files in to HDFS blocks efficiently and hence HAR can be used to tackle the small files problem in Hadoop. A Hadoop archive always has a *.har extension.
Note:
HAR is created from a collection of files and the archiving tool (a simple command) will run a MapReduce job to process the input files in parallel and create an archive file.
How to create HAR file:
Syntax:
Usage: hadoop archive -archiveName name -p <parent> [-r <replication factor>] <src>* <dest>
Command:
mano@Mano:~$ hadoop archive -archiveName Mano.har -p /MANO/Sqoop_import_table/employees /HAR
How to list HAR file :
Command:
mano@Mano:~$ hadoop fs -ls /HAR/Mano.har
Found 4 items
-rw-r--r-- 1 mano supergroup 0 2017-09-02 16:44 /HAR/Mano.har/_SUCCESS
-rw-r--r-- 5 mano supergroup 446 2017-09-02 16:44 /HAR/Mano.har/_index
-rw-r--r-- 5 mano supergroup 22 2017-09-02 16:44 /HAR/Mano.har/_masterindex
-rw-r--r-- 1 mano supergroup 65 2017-09-02 16:43 /HAR/Mano.har/part-0
A Hadoop archive directory contains metadata (in the form of _index and _masterindex) and data (part-*) files. The _index file contains the name of the files that are part of the archive and the location within the part files.
2)checknative
This command checks the availability of the Hadoop native code
Syntax:
Usage: hadoop checknative [-a] [-h]
-a- To Check all libraries are available.
-h - prints help message
commands:
mano@Mano:~$ hadoop checknative -a
mano@Mano:~$ hadoop checknative -h
3)classpath
prints the class path needed to get the Hadoop jar and the required libraries
Syntax:
Usage: hadoop classpath [--glob |--jar <path> |-h |--help]
--glob argument helps - to view the class path without wildcards.
Command:
mano@Mano:~$ hadoop classpath
mano@Mano:~$ hadoop classpath --glob
4)distch
Change the ownership and permissions on many files at once.
Syntax:
Usage: hadoop distch [-f urilist_url] [-i] [-log logdir] path:owner:group:permissions
COMMAND_OPTION | Description |
-f | List of objects to change |
-i | Ignore failures |
-log | Directory to log output |
5)distcp - distributed copy
Distributed copy file or directories recursively.
distcp is a general utility for copying large data sets between distributed file systems within and across clusters. The distcp command submits a regular MapReduce job that performs a file-by-file copy.
Example:
Copy a directory from one node in the cluster to another
hadoop fs -distcp hdfs://namenodeA/apache_hadoop hdfs://namenodeB/hadoop
6)fs
It is a synonym for hdfs dfs when HDFS is in use.
7)jar
Runs a jar file.
Syntax:
Usage: hadoop jar <jar> [mainClass] args...
8)version
Prints the hadoop version
Command:
mano@Mano:~$ hadoop version
9)CLASSNAME
Runs the class named CLASSNAME. The class must be part of a package.
Syntax:
Usage: hadoop CLASSNAME
10)dfsadmin -safemode
Basically safemode makes only the Read access
Commands:
1)To get help/to see available options
hadoop@Mano:~$ hadoop dfsadmin -safemode
2) To enter into safemode
hadoop@Mano:~$ hadoop dfsadmin -safemode enter
3)To leave safemode
hadoop@Mano:~$ hadoop dfsadmin -safemode leave
4)To list safemode
hadoop@Mano:~$ hadoop dfsadmin -safemode get
5)To wait and list safemode
hadoop@Mano:~$ hadoop dfsadmin -safemode wait
Previous Page: Next Page
Thanks for sharing your innovative ideas to our vision. I have read your blog and I gathered some new information through your blog. Your blog is really very informative and unique. Keep posting like this. Awaiting for your further update.If you are looking for any Hadoop related information, please visit our website Hadoop training in Bangalore.
ReplyDelete