Thursday, 7 September 2017

7)Sqoop Tool4: sqoop-export

Sqoop Tool4: sqoop-export:
The export tool exports a set of files from HDFS back to an RDBMS.
 Exports are performed by multiple writers in parallel. Each writer uses a separate connection to the database; these have separate transactions from one another.

Note:
The target table must already exist in the database.

Sqoop export process:
  • The default operation is to transform these into a set of INSERT statements that inject the records into the database.
  • In "update mode," Sqoop will generate UPDATE statements that replace existing records in the database,
  • In "call mode" Sqoop will make a stored procedure call for each record.
Syntax:
$ sqoop export (generic-args) (export-args)
or
$ sqoop-export (generic-args) (export-args)

1)Common arguments:

Argument Description
--connect <jdbc-uri> Specify JDBC connect string
--connection-manager <class-name> Specify connection manager class to use
--driver <class-name> Manually specify JDBC driver class to use
--hadoop-mapred-home <dir> Override $HADOOP_MAPRED_HOME
--help Print usage instructions
--password-file Set path for a file containing the authentication password
-P Read password from console
--password <password> Set authentication password
--username <username> Set authentication username
--verbose Print more information while working
--connection-param-file <filename> Optional properties file that provides connection parameters
--relaxed-isolation Set connection transaction isolation to read uncommitted for the mappers.
 
2)Validation arguments:

Argument Description
--validate Enable validation of data copied, supports single table copy only.
--validator <class-name> Specify validator class to use.
--validation-threshold <class-name> Specify validation threshold class to use.
--validation-failurehandler <class-name> Specify validation failure handler class to use.

3)Export control arguments:

Argument Description
--columns <col,col,col…> Columns to export to table
--direct Use direct export fast path
--export-dir <dir> HDFS source path for the export
-m,--num-mappers <n> Use n map tasks to export in parallel
--table <table-name> Table to populate
--call <stored-proc-name> Stored Procedure to call
--update-key <col-name> Anchor column to use for updates. Use a comma separated list of columns if there are more than one column.
--update-mode <mode> Specify how updates are performed when new rows are found with non-matching keys in database.


Legal values for mode include updateonly (default) and allowinsert.
--input-null-string <null-string> The string to be interpreted as null for string columns
--input-null-non-string <null-string> The string to be interpreted as null for non-string columns
--staging-table <staging-table-name> The table in which data will be staged before being inserted into the destination table.
--clear-staging-table Indicates that any data present in the staging table can be deleted.
--batch Use batch mode for underlying statement execution.

Command: using --table, --export-dir

  • --table - Table to populate on mysql
  • --export-dir - HDFS source path for the export

mano@Mano:~$ sqoop-export --connect jdbc:mysql://localhost/sqoop_export --username root --password root --export-dir '/MANO/Sqoop_import_table/students' -m 1 --table students_ex

Command: using --columns argument

--columns - columns export to table

mano@Mano:~$ sqoop-export --connect jdbc:mysql://localhost/sqoop_export --username root --password root --table students_ex_cols --export-dir '/MANO/Sqoop_import_table/students' --columns 'id,name'


After export, only id,name columns on mysql database


In the above screen, results are duplicate, because of no primary key

Command: using --update-key and --update-mode
--update-key - should be primary key or unique
--update-mode - should be updateonly or allowinsert(doest both update and insert)

mano@Mano:~$ sqoop-export --connect jdbc:mysql://localhost/sqoop_export --username root --password root --export-dir '/MANO/Sqoop_import_table/students' --table students_ex  --update-key id --update-mode allowinsert

Running that , will follow like below

UPDATE students_ex SET name='Mano' WHERE id=1;
UPDATE students_ex SET name='Chella' WHERE id=2;

Data on HDFS:

Using --update-key ==> no duplicates, and mode is allowinsert
mysql> alter table students_ex add primary key(id);
Query OK, 0 rows affected (0.20 sec)
Records: 0  Duplicates: 0  Warnings: 0
4)Input parsing arguments:

Argument Description
--input-enclosed-by <char> Sets a required field encloser
--input-escaped-by <char> Sets the input escape character
--input-fields-terminated-by <char> Sets the input field separator
--input-lines-terminated-by <char> Sets the input end-of-line character
--input-optionally-enclosed-by <char> Sets a field enclosing character

5)Output line formatting arguments:

Argument Description
--enclosed-by <char> Sets a required field enclosing character
--escaped-by <char> Sets the escape character
--fields-terminated-by <char> Sets the field separator character
--lines-terminated-by <char> Sets the end-of-line character
--mysql-delimiters Uses MySQL’s default delimiter set: fields: , lines: \n escaped-by: \ optionally-enclosed-by: '
--optionally-enclosed-by <char> Sets a field enclosing character

Sqoop automatically generates code to parse and interpret records of the files containing the data to be exported back to the database. If these files were created with non-default delimiters (comma-separated fields with newline-separated records), you should specify the same delimiters again so that Sqoop can parse your files.

If you specify incorrect delimiters, Sqoop will fail to find enough columns per line. This will cause export map tasks to fail by throwing ParseExceptions.
6)Code generation arguments:

Argument Description
--bindir <dir> Output directory for compiled objects
--class-name <name> Sets the generated class name. This overrides --package-name. When combined with --jar-file, sets the input class.
--jar-file <file> Disable code generation; use specified jar
--outdir <dir> Output directory for generated code
--package-name <name> Put auto-generated classes in this package
--map-column-java <m> Override default mapping from SQL type to Java type for configured columns.

Note:
Exports may fail for a number of reasons:
  • Loss of connectivity from the Hadoop cluster to the database (either due to hardware fault, or server software crashes)
  • Attempting to INSERT a row which violates a consistency constraint (for example, inserting a duplicate primary key value)
  • Attempting to parse an incomplete or malformed record from the HDFS source data
  • Attempting to parse records using incorrect delimiters
  •  Capacity issues (such as insufficient RAM or disk space) 
Please follow the link for further ==>Sqoop_Page8

No comments:

Post a Comment

Fundamentals of Python programming

Fundamentals of Python programming: Following below are the fundamental constructs of Python programming: Python Data types Python...