Saturday 2 September 2017

Hadoop Common Commands:

Hadoop Common Commands:

Hadoop help usage:

command:
mano@Mano:~$ hadoop -help



1)archive:
Hadoop Archives:

Archives are to the problem with small files in Hadoop:

Hadoop works best with big files and small files are handled inefficiently in HDFS.  As we know, Namenode holds the metadata information in memory for all the files stored in HDFS. Let’s say we have a file in HDFS which is 1 GB in size and the Namenode will store metadata information of the file – like file name, creator, created time stamp, blocks, permissions etc.

Now assume we decide to split this 1 GB file in to 1000 pieces and store all 1000 “small” files in HDFS. Now Namenode has to store metadata information of 1000 small files in memory. This is not very efficient – first it takes up a lot of memory and second soon Namenode will become a bottleneck as it is trying to manage a lot of data.

What is HAR?

Hadoop Archives or HAR is an archiving facility that packs files in to HDFS blocks efficiently and hence HAR can be used to tackle the small files problem in Hadoop. A Hadoop archive always has a *.har extension.

Note:
HAR is created from a collection of files and the archiving tool (a simple command) will run a MapReduce job to process the input files in parallel and create an archive file.

How to create HAR file:

Syntax:
Usage:
hadoop archive -archiveName name -p <parent> [-r <replication factor>] <src>* <dest>

Command:

mano@Mano:~$ hadoop archive -archiveName Mano.har -p /MANO/Sqoop_import_table/employees /HAR

How to list HAR file :

Command:
mano@Mano:~$ hadoop fs -ls /HAR/Mano.har
Found 4 items
-rw-r--r--   1 mano supergroup          0 2017-09-02 16:44 /HAR/Mano.har/_SUCCESS
-rw-r--r--   5 mano supergroup        446 2017-09-02 16:44 /HAR/Mano.har/_index
-rw-r--r--   5 mano supergroup         22 2017-09-02 16:44 /HAR/Mano.har/_masterindex
-rw-r--r--   1 mano supergroup         65 2017-09-02 16:43 /HAR/Mano.har/part-0


 A Hadoop archive directory contains metadata (in the form of _index and _masterindex) and data (part-*) files. The _index file contains the name of the files that are part of the archive and the location within the part files.

2)checknative

This command checks the availability of the Hadoop native code

Syntax:

Usage: hadoop checknative [-a] [-h]

-a- To Check all libraries are available.
-h - prints help  message
commands:
mano@Mano:~$ hadoop checknative -a

mano@Mano:~$ hadoop checknative -h



3)classpath
prints the class path needed to get the Hadoop jar and the required libraries

Syntax:
Usage: hadoop classpath [--glob |--jar <path> |-h |--help]


--glob argument helps - to view the class path without wildcards.

Command:
mano@Mano:~$ hadoop classpath



mano@Mano:~$ hadoop classpath --glob

4)distch
Change the ownership and permissions on many files at once.

Syntax:
Usage: hadoop distch [-f urilist_url] [-i] [-log logdir] path:owner:group:permissions



COMMAND_OPTION Description
-f List of objects to change
-i Ignore failures
-log Directory to log output

5)distcp - distributed copy
Distributed copy file or directories recursively.

distcp is a general utility for copying large data sets between distributed file systems within and across clusters. The distcp command submits a regular MapReduce job that performs a file-by-file copy.

Example:

Copy a directory from one node in the cluster to another
hadoop fs -distcp hdfs://namenodeA/apache_hadoop hdfs://namenodeB/hadoop

6)fs
It is a synonym for hdfs dfs when HDFS is in use.

7)jar
Runs a jar file.

Syntax:
Usage: hadoop jar <jar> [mainClass] args...

8)version

Prints the hadoop version

Command:
mano@Mano:~$ hadoop version




9)CLASSNAME
Runs the class named CLASSNAME. The class must be part of a package.

Syntax:

Usage: hadoop CLASSNAME

10)dfsadmin -safemode

Basically safemode makes only the Read access

Commands:
1)To get help/to see available options
hadoop@Mano:~$ hadoop dfsadmin -safemode

2) To enter into safemode
hadoop@Mano:~$ hadoop dfsadmin -safemode enter

3)To leave safemode
hadoop@Mano:~$ hadoop dfsadmin -safemode leave

4)To list safemode
hadoop@Mano:~$ hadoop dfsadmin -safemode get

5)To wait and list safemode
hadoop@Mano:~$ hadoop dfsadmin -safemode wait


Previous Page:                                                                                                                        Next Page

1 comment:

  1. Thanks for sharing your innovative ideas to our vision. I have read your blog and I gathered some new information through your blog. Your blog is really very informative and unique. Keep posting like this. Awaiting for your further update.If you are looking for any Hadoop related information, please visit our website Hadoop training in Bangalore.

    ReplyDelete

Fundamentals of Python programming

Fundamentals of Python programming: Following below are the fundamental constructs of Python programming: Python Data types Python...