Share my learning's: 9)Sqoop Tool6: sqoop-merge

Thursday, 7 September 2017

9)Sqoop Tool6: sqoop-merge

Sqoop Tool6: sqoop-merge:

The merge tool allows to combine two datasets where entries in one dataset should overwrite entries of an older dataset.

Example:
an incremental import run in last-modified mode will generate multiple datasets in HDFS where successively newer data appears in each dataset.

The merge tool will "flatten" two datasets into one, taking the newest available records for each primary key.

Syntax:

$ sqoop merge (generic-args) (merge-args)
or
$ sqoop-merge (generic-args) (merge-args)

Merge options:

Argument	Description
--class-name <class>	Specify the name of the record-specific class to use during the merge job.
--jar-file <file>	Specify the name of the jar to load the record class from.
--merge-key <col>	Specify the name of a column to use as the merge key.
--new-data <path>	Specify the path of the newer dataset.
--onto <path>	Specify the path of the older dataset.
--target-dir <path>	Specify the target path for the output of the merge job.

Dataset1 on MYSQL:

Import Dataset1 to HDFS:
mano@Mano:~$ sqoop-import --connect jdbc:mysql://localhost/sqoop_test --username root --password root --table students -m 1 --target-dir '/MANO/sqoop_merge/dataset1/import' --outdir javafiles

Dataset2 on MYSQL:

Import Dataset2 to HDFS:
mano@Mano:~$ sqoop-import --connect jdbc:mysql://localhost/sqoop_test --username root --password root --table students --where "id in (5,8)" -m 1 --target-dir '/MANO/sqoop_merge/dataset2/import' --outdir javafiles

Merge datasets:
Syntax:
sqoop-merge --merge-key primary_key \
--new-date path \
--onto path \
--target-dir path \
--class-file class \
--jar-file <we get from last_import>

mano@Mano:~$ sqoop-merge --merge-key id --new-data /MANO/sqoop_merge/dataset2/import --onto /MANO/sqoop_merge/dataset1/import --target-dir /MANO/sqoop_merge/merge/dataset --class-name students --jar-file /tmp/sqoop-mano/compile/f688439e671117f44bec5f7cec98ae28/students.jar

Final merge dataset: