Share my learning's: 4)Sqoop Tool1: sqoop-import:(Contd)

5)Incremental Imports:

Sqoop provides an incremental import mode which can be used to retrieve only rows newer than some previously-imported set of rows.

Two types of incremental imports:
append - append mode when importing a table where new rows are being added with increasing row id values.(Only new insertions)
lastmodified - when rows of the source table may be updated, and each such update will set the value of a last-modified column to the current timestamp.

Working Mechanism:
Based on --incremental argument, it decides either append or lastmodified mode, then the --check-column argument, check the value of column > than --last-value

Argument	Description
--check-column (col)	Specifies the column to be examined when determining which rows to import. (the column should not be of type CHAR/NCHAR/VARCHAR/VARNCHAR/ LONGVARCHAR/LONGNVARCHAR)
--incremental (mode)	Specifies how Sqoop determines which rows are new. Legal values for mode include append and lastmodified.
--last-value (value)	Specifies the maximum value of the check column from the previous import.

Example: using argument append

Step 1: Table on mysql database ==> which acts as source for hdfs to import

Step 3: Normal import table to HDFS
mano@Mano:~$ sqoop-import --connect jdbc:mysql://localhost/sqoop_test --username root --password root --table students -m 1 --target-dir /MANO/Sqoop_import_table/students

Data on HDFS after import

Step 4: using incremental import by append mode for new records on source databse

Let's first insert new record on to table and see,

Command:
mano@Mano:~$ sqoop-import --connect jdbc:mysql://localhost/sqoop_test --username root --password root --table students --check-column id --incremental append --last-value 5 -m 1 --target-dir /MANO/Sqoop_import_table/students

Output:
17/09/04 12:50:44 INFO tool.ImportTool: --incremental append
17/09/04 12:50:44 INFO tool.ImportTool: --check-column id
17/09/04 12:50:44 INFO tool.ImportTool: --last-value 6
17/09/04 12:50:44 INFO tool.ImportTool: (Consider saving this with 'sqoop job --create')

mano@Mano:~$ hadoop fs -cat /MANO/Sqoop_import_table/students/part-m-00001
6,Govind,2017-09-04

Example: using argument lastmodified

Step 1: After data update on database table

Step 2: View existing data on HDFS

mano@Mano:~$ hadoop fs -cat /MANO/Sqoop_import_table/students/part*
1,Mano,2017-09-04
2,Govind,2017-09-04
3,Mahan,2017-09-04
4,Madhan,2017-09-04
5,Kishore,2017-09-04
6,Govind,2017-09-04
mano@Mano:~$

Step 3: using argument lastmodified incremental mode

Command:
mano@Mano:~$ sqoop-import --connect jdbc:mysql://localhost/sqoop_test --username root --password root --table students -m 1 --check-column last_update --incremental lastmodified --last-value 2017-09-04 --target-dir /MANO/Sqoop_import_table/students --append

Note:
We need to use the --append import control argument , since we are writing on existing directory

Output:
17/09/04 13:23:35 INFO mapreduce.ImportJobBase: Retrieved 6 records.
17/09/04 13:23:35 INFO util.AppendUtils: Appending to directory students
17/09/04 13:23:35 INFO util.AppendUtils: Using found partition 2
17/09/04 13:23:35 INFO tool.ImportTool: Incremental import complete! To run another incremental import of all data following this import, supply the following arguments:
17/09/04 13:23:35 INFO tool.ImportTool: --incremental lastmodified
17/09/04 13:23:35 INFO tool.ImportTool: --check-column last_update
17/09/04 13:23:35 INFO tool.ImportTool: --last-value 2017-09-04 13:22:07.0
17/09/04 13:23:35 INFO tool.ImportTool: (Consider saving this with 'sqoop job --create')

mano@Mano:~$ hadoop fs -cat /MANO/Sqoop_import_table/students/part-m-00002
1,Chella,2017-09-04
2,Prasath,2017-09-04
3,Mano,2017-09-04
4,Madhan,2017-09-04
5,Kishore,2017-09-04
6,Govind,2017-09-04

Note : we might see the separate file for each operation , we perform on HDFS

Incremental Imports Status table:

Argument	Description	Example
--check-column (col)	Specifies the column to be examined when determining which rows to import. (the column should not be of type CHAR/NCHAR/VARCHAR/VARNCHAR/ LONGVARCHAR/LONGNVARCHAR)	Done
--incremental (mode)	Specifies how Sqoop determines which rows are new. Legal values for mode include append and lastmodified.	Done
--last-value (value)	Specifies the maximum value of the check column from the previous import.	Done

6)Output line formatting arguments:

importing delimited files, the choice of delimiter is important. Delimiters which appear inside string-based fields may cause ambiguous parsing of the imported data by subsequent analysis passes.

Example:
the string "Hello, happy to see you" should not be imported with the end-of-field delimiter set to a comma.

Argument	Description
--enclosed-by <char>	Sets a required field enclosing character
--escaped-by <char>	Sets the escape character
--fields-terminated-by <char>	Sets the field separator character
--lines-terminated-by <char>	Sets the end-of-line character
--mysql-delimiters	Uses MySQL’s default delimiter set: fields: , lines: \n escaped-by: \ optionally-enclosed-by: '
--optionally-enclosed-by <char>	Sets a field enclosing character

Delimiters may be specified as:

a character (--fields-terminated-by X)

an escape character (--fields-terminated-by \t). Supported escape characters are:

        \b (backspace)
        \n (newline)
        \r (carriage return)
        \t (tab)
        \" (double-quote)
        \\' (single-quote)
        \\ (backslash)
        \0 (NUL) - This will insert NUL characters between fields or lines, or will disable enclosing/escaping if used for one of the --enclosed-by, --optionally-enclosed-by, or --escaped-by arguments.

Note:
The default delimiters are a comma (,) for fields, a newline (\n) for records, no quote character, and no escape character.

Command: using --mysql-delimiters
Uses MySQL’s default delimiter set: fields: , lines: \n escaped-by: \ optionally-enclosed-by: '

Sql data on Table:

Import using --mysql-delimiters

mano@Mano:~$ sqoop-import --connect jdbc:mysql://localhost/sqoop_test --username root --password root --table students --delete-target-dir -m 1 --mysql-delimiters

Command: using --enclosed-by and --escaped-by

mano@Mano:~$ sqoop-import --connect jdbc:mysql://localhost/sqoop_test --username root --password root --table students --delete-target-dir -m 1 --enclosed-by "\\'" --escaped-by '\'

Import results:

Note: on the above, we see the enclosed-by single qutes applied to all the fields,

to avoid that, we can use --optionally-enclosed -by argument

Command: using --optionally-enclosed-by

mano@Mano:~$ sqoop-import --connect jdbc:mysql://localhost/sqoop_test --username root --password root --table students --delete-target-dir -m 1 --optionally-enclosed-by '\"' --escaped-by '\'

Sql data on Table:

Import results:

Command: Using --fields-terminated-by and --lines-terminated-by

mano@Mano:~$ sqoop-import --connect jdbc:mysql://localhost/sqoop_test --username root --password root --table students --delete-target-dir -m 1 --fields-terminated-by '|' --lines-terminated-by '\t'

Output:
mano@Mano:~$ hadoop fs -cat /user/mano/students/par*
1|Chella|2017-09-04 2|Prasath|2017-09-04 3|Mano|2017-09-04 4|Madhan|2017-09-04 5|Kishore|2017-09-04 6|Govind|2017-09-04 6|This could|2017-09-05 7|"Hey,dear"|2017-09-05

Output line formatting arguments status table:

Argument	Description	Example
--enclosed-by <char>	Sets a required field enclosing character	Done
--escaped-by <char>	Sets the escape character	Done
--fields-terminated-by <char>	Sets the field separator character	Done
--lines-terminated-by <char>	Sets the end-of-line character	Done
--mysql-delimiters	Uses MySQL’s default delimiter set: fields: , lines: \n escaped-by: \ optionally-enclosed-by: '	Done
--optionally-enclosed-by <char>	Sets a field enclosing character	Done

7)Input parsing arguments:

These arguments are used for parsing input files to specify the enclosed by/escaped by characters etc when exporting the files from HDFS to RDBMS tables.

Note:
These delimiters are same as the one set in previous section for storing into HDFS files, which will act as input files for exporting into RDBMS from HDFS.

As sqoop generates a Java class which can reinterpret the text files and for generated parse() method reinterprets this data.

Argument	Description
--input-enclosed-by <char>	Sets a required field encloser
--input-escaped-by <char>	Sets the input escape character
--input-fields-terminated-by <char>	Sets the input field separator
--input-lines-terminated-by <char>	Sets the input end-of-line character
--input-optionally-enclosed-by <char>	Sets a field enclosing character

It's same like about output line fields, will update the examples for this as well soon 😊

8)Hive arguments:
Sqoop can also import the data into Hive by generating and executing a CREATE TABLE statement to define the data’s layout in Hive. Importing data into Hive is as simple as adding the --hive-import option to your Sqoop command line.

use the --hive-home option to identify the Hive installation directory(if in the case, where multiple hive's or manual pointing). Sqoop will use $HIVE_HOME/bin/hive from here

Argument	Description
--hive-home <dir>	Override $HIVE_HOME
--hive-import	Import tables into Hive (Uses Hive’s default delimiters if none are set.)
--hive-overwrite	Overwrite existing data in the Hive table.
--create-hive-table	If set, then the job will fail if the target hive
	table exits. By default this property is false.
--hive-table <table-name>	Sets the table name to use when importing to Hive.
--hive-drop-import-delims	Drops \n, \r, and \01 from string fields when importing to Hive.
--hive-delims-replacement	Replace \n, \r, and \01 from string fields with user defined string when importing to Hive.
--hive-partition-key	Name of a hive field to partition are sharded on
--hive-partition-value <v>	String-value that serves as partition key for this imported into hive in this job.
--map-column-hive <map>	Override default mapping from SQL type to Hive type for configured columns

--hive-database ==> to define the name of database to be use.

Note:

The table name used in Hive is, by default, the same as that of the source RDBMS table. We can change the output table name with the –hive-table option.
If the Hive table already exists, we can override it with –hive-overwrite option.
Hive Importing doesn’t support –as-avrodatafile and –as-sequencefile clauses.
By default, row delimiters in hive are \n and \r characters and column delimiters \001 characters, if source RDBMS tables contain these characters inside fields, then we need to use –hive-drop-import-delims option to drop those characters on import to give Hive-compatible text data, or –hive-delims-replacement option to replace those characters with a user-defined characters.
Sqoop can import data for Hive into a particular partition by specifying the –hive-partition-key and –hive-partition-value arguments

Command: using --hive-import and --create-hive-table
--hive-import - import table into hive
--create-hive-table - creates a new hive table if not exists, else it fails

mano@Mano:~$ sqoop-import --connect jdbc:mysql://localhost/sqoop_test --username root --password root --table students -m 1 --warehouse-dir /user/hive/warehouse --hive-import --create-hive-table

Will update further details related to HBASE, ACCUMULO ASAP.

Please follow the link for further ==> Sqoop_Page5

4 comments:

Unknown5 July 2019 at 02:02
I wonder, whether anyone can write sqoop import query for the below conditions:

There is a system which will process dumps of a file being imported from Mysql through Sqoop.
However, system is designed the way that it can process only files if fields are enclosed in(') single quote and separate of the field should be (-} and line needs to be terminated by : (colon). If data itself contains the " (double quote } than it should be escaped by \.
deiva12 June 2020 at 05:27
"great

Digital Marketing Training Course in Chennai | Digital Marketing Training Course in Anna Nagar | Digital Marketing Training Course in OMR | Digital Marketing Training Course in Porur | Digital Marketing Training Course in Tambaram | Digital Marketing Training Course in Velachery
"
saran17 June 2020 at 00:59
I came from Google to your website. It is really awesome blog.
Digital Marketing Training Course in Chennai | Digital Marketing Training Course in Anna Nagar | Digital Marketing Training Course in OMR | Digital Marketing Training Course in Porur | Digital Marketing Training Course in Tambaram | Digital Marketing Training Course in Velachery

Share my learning's

Monday, 4 September 2017

4)Sqoop Tool1: sqoop-import:(Contd)

4 comments:

Fundamentals of Python programming