5)Incremental Imports:
Sqoop provides an incremental import mode which can be used to retrieve only rows newer than some previously-imported set of rows.
Two types of incremental imports:
append - append mode when importing a table where new rows are being added with increasing row id values.(Only new insertions)
lastmodified - when rows of the source table may be updated, and each such update will set the value of a last-modified column to the current timestamp.
Working Mechanism:
Based on --incremental argument, it decides either append or lastmodified mode, then the --check-column argument, check the value of column > than --last-value
 
 
 
 
 
Example: using argument append
Step 1: Table on mysql database ==> which acts as source for hdfs to import
Step 3: Normal import table to HDFS
mano@Mano:~$ sqoop-import --connect jdbc:mysql://localhost/sqoop_test --username root --password root --table students -m 1 --target-dir /MANO/Sqoop_import_table/students
Data on HDFS after import
Step 4: using incremental import by append mode for new records on source databse
Let's first insert new record on to table and see,
Command:
mano@Mano:~$ sqoop-import --connect jdbc:mysql://localhost/sqoop_test --username root --password root --table students --check-column id --incremental append --last-value 5 -m 1 --target-dir /MANO/Sqoop_import_table/students
Output:
17/09/04 12:50:44 INFO tool.ImportTool: --incremental append
17/09/04 12:50:44 INFO tool.ImportTool: --check-column id
17/09/04 12:50:44 INFO tool.ImportTool: --last-value 6
17/09/04 12:50:44 INFO tool.ImportTool: (Consider saving this with 'sqoop job --create')
mano@Mano:~$ hadoop fs -cat /MANO/Sqoop_import_table/students/part-m-00001
6,Govind,2017-09-04
Example: using argument lastmodified
Step 1: After data update on database table
Step 2: View existing data on HDFS
mano@Mano:~$ hadoop fs -cat /MANO/Sqoop_import_table/students/part*
1,Mano,2017-09-04
2,Govind,2017-09-04
3,Mahan,2017-09-04
4,Madhan,2017-09-04
5,Kishore,2017-09-04
6,Govind,2017-09-04
mano@Mano:~$
Step 3: using argument lastmodified incremental mode
Command:
mano@Mano:~$ sqoop-import --connect jdbc:mysql://localhost/sqoop_test --username root --password root --table students -m 1 --check-column last_update --incremental lastmodified --last-value 2017-09-04 --target-dir /MANO/Sqoop_import_table/students --append
Note:
We need to use the --append import control argument , since we are writing on existing directory
Output:
17/09/04 13:23:35 INFO mapreduce.ImportJobBase: Retrieved 6 records.
17/09/04 13:23:35 INFO util.AppendUtils: Appending to directory students
17/09/04 13:23:35 INFO util.AppendUtils: Using found partition 2
17/09/04 13:23:35 INFO tool.ImportTool: Incremental import complete! To run another incremental import of all data following this import, supply the following arguments:
17/09/04 13:23:35 INFO tool.ImportTool: --incremental lastmodified
17/09/04 13:23:35 INFO tool.ImportTool: --check-column last_update
17/09/04 13:23:35 INFO tool.ImportTool: --last-value 2017-09-04 13:22:07.0
17/09/04 13:23:35 INFO tool.ImportTool: (Consider saving this with 'sqoop job --create')
mano@Mano:~$ hadoop fs -cat /MANO/Sqoop_import_table/students/part-m-00002
1,Chella,2017-09-04
2,Prasath,2017-09-04
3,Mano,2017-09-04
4,Madhan,2017-09-04
5,Kishore,2017-09-04
6,Govind,2017-09-04
Note : we might see the separate file for each operation , we perform on HDFS
  
 
6)Output line formatting arguments:
importing delimited files, the choice of delimiter is important. Delimiters which appear inside string-based fields may cause ambiguous parsing of the imported data by subsequent analysis passes.
Example:
the string "Hello, happy to see you" should not be imported with the end-of-field delimiter set to a comma.
 
 
 
 
 
Delimiters may be specified as:
\n (newline)
\r (carriage return)
\t (tab)
\" (double-quote)
\\' (single-quote)
\\ (backslash)
\0 (NUL) - This will insert NUL characters between fields or lines, or will disable enclosing/escaping if used for one of the --enclosed-by, --optionally-enclosed-by, or --escaped-by arguments.
Note:
The default delimiters are a comma (,) for fields, a newline (\n) for records, no quote character, and no escape character.
Command: using --mysql-delimiters
Uses MySQL’s default delimiter set: fields: , lines: \n escaped-by: \ optionally-enclosed-by: '
Sql data on Table:
Import using --mysql-delimiters
mano@Mano:~$ sqoop-import --connect jdbc:mysql://localhost/sqoop_test --username root --password root --table students --delete-target-dir -m 1 --mysql-delimiters
Command: using --enclosed-by and --escaped-by
mano@Mano:~$ sqoop-import --connect jdbc:mysql://localhost/sqoop_test --username root --password root --table students --delete-target-dir -m 1 --enclosed-by "\\'" --escaped-by '\'
Import results:
Note: on the above, we see the enclosed-by single qutes applied to all the fields,
 
to avoid that, we can use --optionally-enclosed -by argument
Command: using --optionally-enclosed-by
mano@Mano:~$ sqoop-import --connect jdbc:mysql://localhost/sqoop_test --username root --password root --table students --delete-target-dir -m 1 --optionally-enclosed-by '\"' --escaped-by '\'
Sql data on Table:
Import results:
Output:
mano@Mano:~$ hadoop fs -cat /user/mano/students/par*
1|Chella|2017-09-04 2|Prasath|2017-09-04 3|Mano|2017-09-04 4|Madhan|2017-09-04 5|Kishore|2017-09-04 6|Govind|2017-09-04 6|This could|2017-09-05 7|"Hey,dear"|2017-09-05
 
These arguments are used for parsing input files to specify the enclosed by/escaped by characters etc when exporting the files from HDFS to RDBMS tables. 
Note:
These delimiters are same as the one set in previous section for storing into HDFS files, which will act as input files for exporting into RDBMS from HDFS.
As sqoop generates a Java class which can reinterpret the text files and for generated parse() method reinterprets this data.
 
 
 
 
 
 
 
Sqoop provides an incremental import mode which can be used to retrieve only rows newer than some previously-imported set of rows.
Two types of incremental imports:
append - append mode when importing a table where new rows are being added with increasing row id values.(Only new insertions)
lastmodified - when rows of the source table may be updated, and each such update will set the value of a last-modified column to the current timestamp.
Working Mechanism:
Based on --incremental argument, it decides either append or lastmodified mode, then the --check-column argument, check the value of column > than --last-value
| Argument | Description | 
| --check-column (col) | Specifies the column to be examined when determining which rows to import. (the column should not be of type CHAR/NCHAR/VARCHAR/VARNCHAR/ LONGVARCHAR/LONGNVARCHAR) | 
| --incremental (mode) | Specifies how Sqoop determines which rows are new. Legal values for mode include append and lastmodified. | 
| --last-value (value) | Specifies the maximum value of the check column from the previous import. | 
Example: using argument append
Step 1: Table on mysql database ==> which acts as source for hdfs to import
Step 3: Normal import table to HDFS
mano@Mano:~$ sqoop-import --connect jdbc:mysql://localhost/sqoop_test --username root --password root --table students -m 1 --target-dir /MANO/Sqoop_import_table/students
Data on HDFS after import
Step 4: using incremental import by append mode for new records on source databse
Let's first insert new record on to table and see,
Command:
mano@Mano:~$ sqoop-import --connect jdbc:mysql://localhost/sqoop_test --username root --password root --table students --check-column id --incremental append --last-value 5 -m 1 --target-dir /MANO/Sqoop_import_table/students
Output:
17/09/04 12:50:44 INFO tool.ImportTool: --incremental append
17/09/04 12:50:44 INFO tool.ImportTool: --check-column id
17/09/04 12:50:44 INFO tool.ImportTool: --last-value 6
17/09/04 12:50:44 INFO tool.ImportTool: (Consider saving this with 'sqoop job --create')
mano@Mano:~$ hadoop fs -cat /MANO/Sqoop_import_table/students/part-m-00001
6,Govind,2017-09-04
Example: using argument lastmodified
Step 1: After data update on database table
Step 2: View existing data on HDFS
mano@Mano:~$ hadoop fs -cat /MANO/Sqoop_import_table/students/part*
1,Mano,2017-09-04
2,Govind,2017-09-04
3,Mahan,2017-09-04
4,Madhan,2017-09-04
5,Kishore,2017-09-04
6,Govind,2017-09-04
mano@Mano:~$
Step 3: using argument lastmodified incremental mode
Command:
mano@Mano:~$ sqoop-import --connect jdbc:mysql://localhost/sqoop_test --username root --password root --table students -m 1 --check-column last_update --incremental lastmodified --last-value 2017-09-04 --target-dir /MANO/Sqoop_import_table/students --append
Note:
We need to use the --append import control argument , since we are writing on existing directory
Output:
17/09/04 13:23:35 INFO mapreduce.ImportJobBase: Retrieved 6 records.
17/09/04 13:23:35 INFO util.AppendUtils: Appending to directory students
17/09/04 13:23:35 INFO util.AppendUtils: Using found partition 2
17/09/04 13:23:35 INFO tool.ImportTool: Incremental import complete! To run another incremental import of all data following this import, supply the following arguments:
17/09/04 13:23:35 INFO tool.ImportTool: --incremental lastmodified
17/09/04 13:23:35 INFO tool.ImportTool: --check-column last_update
17/09/04 13:23:35 INFO tool.ImportTool: --last-value 2017-09-04 13:22:07.0
17/09/04 13:23:35 INFO tool.ImportTool: (Consider saving this with 'sqoop job --create')
mano@Mano:~$ hadoop fs -cat /MANO/Sqoop_import_table/students/part-m-00002
1,Chella,2017-09-04
2,Prasath,2017-09-04
3,Mano,2017-09-04
4,Madhan,2017-09-04
5,Kishore,2017-09-04
6,Govind,2017-09-04
Note : we might see the separate file for each operation , we perform on HDFS
| Argument | Description | Example | 
| --check-column (col) | Specifies the column to be examined when determining which rows to import. (the column should not be of type CHAR/NCHAR/VARCHAR/VARNCHAR/ LONGVARCHAR/LONGNVARCHAR) | Done | 
| --incremental (mode) | Specifies how Sqoop determines which rows are new. Legal values for mode include append and lastmodified. | Done | 
| --last-value (value) | Specifies the maximum value of the check column from the previous import. | Done | 
6)Output line formatting arguments:
importing delimited files, the choice of delimiter is important. Delimiters which appear inside string-based fields may cause ambiguous parsing of the imported data by subsequent analysis passes.
Example:
the string "Hello, happy to see you" should not be imported with the end-of-field delimiter set to a comma.
| Argument | Description | 
| --enclosed-by <char> | Sets a required field enclosing character | 
| --escaped-by <char> | Sets the escape character | 
| --fields-terminated-by <char> | Sets the field separator character | 
| --lines-terminated-by <char> | Sets the end-of-line character | 
| --mysql-delimiters | Uses MySQL’s default delimiter set: fields: , lines: \n escaped-by: \ optionally-enclosed-by: ' | 
| --optionally-enclosed-by <char> | Sets a field enclosing character | 
Delimiters may be specified as:
- a character (--fields-terminated-by X)
- an escape character (--fields-terminated-by \t). Supported escape characters are:
\n (newline)
\r (carriage return)
\t (tab)
\" (double-quote)
\\' (single-quote)
\\ (backslash)
\0 (NUL) - This will insert NUL characters between fields or lines, or will disable enclosing/escaping if used for one of the --enclosed-by, --optionally-enclosed-by, or --escaped-by arguments.
Note:
The default delimiters are a comma (,) for fields, a newline (\n) for records, no quote character, and no escape character.
Command: using --mysql-delimiters
Uses MySQL’s default delimiter set: fields: , lines: \n escaped-by: \ optionally-enclosed-by: '
Sql data on Table:
Import using --mysql-delimiters
mano@Mano:~$ sqoop-import --connect jdbc:mysql://localhost/sqoop_test --username root --password root --table students --delete-target-dir -m 1 --mysql-delimiters
Command: using --enclosed-by and --escaped-by
mano@Mano:~$ sqoop-import --connect jdbc:mysql://localhost/sqoop_test --username root --password root --table students --delete-target-dir -m 1 --enclosed-by "\\'" --escaped-by '\'
Import results:
Note: on the above, we see the enclosed-by single qutes applied to all the fields,
to avoid that, we can use --optionally-enclosed -by argument
Command: using --optionally-enclosed-by
mano@Mano:~$ sqoop-import --connect jdbc:mysql://localhost/sqoop_test --username root --password root --table students --delete-target-dir -m 1 --optionally-enclosed-by '\"' --escaped-by '\'
Sql data on Table:
Import results:
Command: Using --fields-terminated-by and --lines-terminated-by
mano@Mano:~$ sqoop-import --connect jdbc:mysql://localhost/sqoop_test --username root --password root --table students --delete-target-dir -m 1 --fields-terminated-by '|' --lines-terminated-by '\t' Output:
mano@Mano:~$ hadoop fs -cat /user/mano/students/par*
1|Chella|2017-09-04 2|Prasath|2017-09-04 3|Mano|2017-09-04 4|Madhan|2017-09-04 5|Kishore|2017-09-04 6|Govind|2017-09-04 6|This could|2017-09-05 7|"Hey,dear"|2017-09-05
Output line formatting arguments status table:
 
 
 
 
| Argument | Description | Example | 
| --enclosed-by <char> | Sets a required field enclosing character | Done | 
| --escaped-by <char> | Sets the escape character | Done | 
| --fields-terminated-by <char> | Sets the field separator character | Done | 
| --lines-terminated-by <char> | Sets the end-of-line character | Done | 
| --mysql-delimiters | Uses MySQL’s default delimiter set: fields: , lines: \n escaped-by: \ optionally-enclosed-by: ' | Done | 
| --optionally-enclosed-by <char> | Sets a field enclosing character | Done | 
7)Input parsing arguments:
Note:
These delimiters are same as the one set in previous section for storing into HDFS files, which will act as input files for exporting into RDBMS from HDFS.
As sqoop generates a Java class which can reinterpret the text files and for generated parse() method reinterprets this data.
| Argument | Description | 
| --input-enclosed-by <char> | Sets a required field encloser | 
| --input-escaped-by <char> | Sets the input escape character | 
| --input-fields-terminated-by <char> | Sets the input field separator | 
| --input-lines-terminated-by <char> | Sets the input end-of-line character | 
| --input-optionally-enclosed-by <char> | Sets a field enclosing character | 
It's same like about output line fields, will update the examples for this as well soon 😊
8)Hive arguments:
Sqoop can also import the data into Hive by generating and executing a CREATE TABLE statement to define the data’s layout in Hive. Importing data into Hive is as simple as adding the --hive-import option to your Sqoop command line.
use the --hive-home option to identify the Hive installation directory(if in the case, where multiple hive's or manual pointing). Sqoop will use $HIVE_HOME/bin/hive from here
Sqoop can also import the data into Hive by generating and executing a CREATE TABLE statement to define the data’s layout in Hive. Importing data into Hive is as simple as adding the --hive-import option to your Sqoop command line.
use the --hive-home option to identify the Hive installation directory(if in the case, where multiple hive's or manual pointing). Sqoop will use $HIVE_HOME/bin/hive from here
| Argument | Description | 
| --hive-home <dir> | Override $HIVE_HOME | 
| --hive-import | Import tables into Hive (Uses Hive’s default delimiters if none are set.) | 
| --hive-overwrite | Overwrite existing data in the Hive table. | 
| --create-hive-table | If set, then the job will fail if the target hive | 
| table exits. By default this property is false. | |
| --hive-table <table-name> | Sets the table name to use when importing to Hive. | 
| --hive-drop-import-delims | Drops \n, \r, and \01 from string fields when importing to Hive. | 
| --hive-delims-replacement | Replace \n, \r, and \01 from string fields with user defined string when importing to Hive. | 
| --hive-partition-key | Name of a hive field to partition are sharded on | 
| --hive-partition-value <v> | String-value that serves as partition key for this imported into hive in this job. | 
| --map-column-hive <map> | Override default mapping from SQL type to Hive type for configured columns | 
--hive-database ==> to define the name of database to be use. 
Note:
- The table name used in Hive is, by default, the same as that of the source RDBMS table. We can change the output table name with the –hive-table option.
- If the Hive table already exists, we can override it with –hive-overwrite option.
- Hive Importing doesn’t support –as-avrodatafile and –as-sequencefile clauses.
- By default, row delimiters in hive are \n and \r characters and column delimiters \001 characters, if source RDBMS tables contain these characters inside fields, then we need to use –hive-drop-import-delims option to drop those characters on import to give Hive-compatible text data, or –hive-delims-replacement option to replace those characters with a user-defined characters.
- Sqoop can import data for Hive into a particular partition by specifying the –hive-partition-key and –hive-partition-value arguments
Command: using --hive-import and --create-hive-table
--hive-import - import table into hive
--create-hive-table - creates a new hive table if not exists, else it fails
mano@Mano:~$ sqoop-import --connect jdbc:mysql://localhost/sqoop_test --username root --password root --table students -m 1 --warehouse-dir /user/hive/warehouse --hive-import --create-hive-table
--hive-import - import table into hive
--create-hive-table - creates a new hive table if not exists, else it fails
mano@Mano:~$ sqoop-import --connect jdbc:mysql://localhost/sqoop_test --username root --password root --table students -m 1 --warehouse-dir /user/hive/warehouse --hive-import --create-hive-table
Will update further details related to HBASE, ACCUMULO ASAP.
Please follow the link for further ==> Sqoop_Page5









 
  
 
 
 
 
I wonder, whether anyone can write sqoop import query for the below conditions:
ReplyDeleteThere is a system which will process dumps of a file being imported from Mysql through Sqoop.
However, system is designed the way that it can process only files if fields are enclosed in(') single quote and separate of the field should be (-} and line needs to be terminated by : (colon). If data itself contains the " (double quote } than it should be escaped by \.
Hi,
Deletei tried to solve this question with this sqoop import:
sqoop import \
--table departments \
--connect jdbc:mysql://quickstart:3306/retail_db \
--username retail_dba \
--password cloudera \
--target-dir /user/cloudera/tests/departments_test \
--enclosed-by "'" \
--escaped-by "\\" \ # here the "\" doesn't work, you shoul put "\\"
--lines-terminated-by ":" \
--fields-terminated-by "-"
before running this sqoop command, i inserted a row in departments table : (11111,'just for test "department_test"').
so after executing the sqoop command i had this result :
hdfs dfs -cat /user/cloudera/tests/departments_test/part*
'2'-'Fitness':'3'-'Footwear':'4'-'Apparel':'5'-'Golf':'6'-'Outdoors':'7'-'Fan Shop':'11111111'-'just for test "department_test"'
It seems good except for the --escaped-by "\\" and i don't know why the 'just for test "department_test"' value didn't changed. it should be like 'just for test \"department_test\"'
i hope it helped you, and i if you have an explanation for the 'just for test "department_test"' value, let me know please :).
Thanks.
"great
ReplyDeleteDigital Marketing Training Course in Chennai | Digital Marketing Training Course in Anna Nagar | Digital Marketing Training Course in OMR | Digital Marketing Training Course in Porur | Digital Marketing Training Course in Tambaram | Digital Marketing Training Course in Velachery
"
I came from Google to your website. It is really awesome blog.
ReplyDeleteDigital Marketing Training Course in Chennai | Digital Marketing Training Course in Anna Nagar | Digital Marketing Training Course in OMR | Digital Marketing Training Course in Porur | Digital Marketing Training Course in Tambaram | Digital Marketing Training Course in Velachery