Thursday, 7 September 2017

9)Sqoop Tool6: sqoop-merge

Sqoop Tool6: sqoop-merge:

The merge tool allows to combine two datasets where entries in one dataset should overwrite entries of an older dataset.

Example:

an incremental import run in last-modified mode will generate multiple datasets in HDFS where successively newer data appears in each dataset.

The merge tool will "flatten" two datasets into one, taking the newest available records for each primary key.

Syntax:

$ sqoop merge (generic-args) (merge-args)
or
$ sqoop-merge (generic-args) (merge-args)

Merge options: 
Argument Description
--class-name <class> Specify the name of the record-specific class to use during the merge job.
--jar-file <file> Specify the name of the jar to load the record class from.
--merge-key <col> Specify the name of a column to use as the merge key.
--new-data <path> Specify the path of the newer dataset.
--onto <path> Specify the path of the older dataset.
--target-dir <path> Specify the target path for the output of the merge job.

Dataset1 on MYSQL:


Import Dataset1 to HDFS:
mano@Mano:~$ sqoop-import --connect jdbc:mysql://localhost/sqoop_test --username root --password root --table students -m 1 --target-dir '/MANO/sqoop_merge/dataset1/import' --outdir javafiles
 

 
Dataset2 on MYSQL:

 


Import Dataset2 to HDFS:
 mano@Mano:~$ sqoop-import --connect jdbc:mysql://localhost/sqoop_test --username root --password root --table students --where "id in (5,8)" -m 1 --target-dir '/MANO/sqoop_merge/dataset2/import' --outdir javafiles


Merge datasets:
Syntax:
sqoop-merge --merge-key primary_key \
--new-date path \
--onto path \
--target-dir path \
--class-file class \
--jar-file <we get from last_import>

mano@Mano:~$ sqoop-merge --merge-key id --new-data /MANO/sqoop_merge/dataset2/import --onto /MANO/sqoop_merge/dataset1/import --target-dir /MANO/sqoop_merge/merge/dataset --class-name students --jar-file /tmp/sqoop-mano/compile/f688439e671117f44bec5f7cec98ae28/students.jar

Final merge dataset:


Please follow the link for further ==>Sqoop_Page10

1 comment:

  1. Does the Sqoop merge works on Hive meta store as well?

    ReplyDelete

Fundamentals of Python programming

Fundamentals of Python programming: Following below are the fundamental constructs of Python programming: Python Data types Python...