Sqoop Tool6: sqoop-merge:
The merge tool allows to combine two datasets where entries in one dataset should overwrite entries of an older dataset.
Example:
an incremental import run in last-modified mode will generate multiple datasets in HDFS where successively newer data appears in each dataset.
The merge tool will "flatten" two datasets into one, taking the newest available records for each primary key.
Syntax:
Merge options:
Dataset1 on MYSQL:
Import Dataset1 to HDFS:
mano@Mano:~$ sqoop-import --connect jdbc:mysql://localhost/sqoop_test --username root --password root --table students -m 1 --target-dir '/MANO/sqoop_merge/dataset1/import' --outdir javafiles
Dataset2 on MYSQL:
Import Dataset2 to HDFS:
mano@Mano:~$ sqoop-import --connect jdbc:mysql://localhost/sqoop_test --username root --password root --table students --where "id in (5,8)" -m 1 --target-dir '/MANO/sqoop_merge/dataset2/import' --outdir javafiles
Merge datasets:
Syntax:
sqoop-merge --merge-key primary_key \
--new-date path \
--onto path \
--target-dir path \
--class-file class \
--jar-file <we get from last_import>
mano@Mano:~$ sqoop-merge --merge-key id --new-data /MANO/sqoop_merge/dataset2/import --onto /MANO/sqoop_merge/dataset1/import --target-dir /MANO/sqoop_merge/merge/dataset --class-name students --jar-file /tmp/sqoop-mano/compile/f688439e671117f44bec5f7cec98ae28/students.jar
Final merge dataset:
Please follow the link for further ==>Sqoop_Page10
The merge tool allows to combine two datasets where entries in one dataset should overwrite entries of an older dataset.
Example:
an incremental import run in last-modified mode will generate multiple datasets in HDFS where successively newer data appears in each dataset.
The merge tool will "flatten" two datasets into one, taking the newest available records for each primary key.
Syntax:
$ sqoop merge (generic-args) (merge-args)
or
$ sqoop-merge (generic-args) (merge-args)
Merge options:
Argument | Description |
--class-name <class> | Specify the name of the record-specific class to use during the merge job. |
--jar-file <file> | Specify the name of the jar to load the record class from. |
--merge-key <col> | Specify the name of a column to use as the merge key. |
--new-data <path> | Specify the path of the newer dataset. |
--onto <path> | Specify the path of the older dataset. |
--target-dir <path> | Specify the target path for the output of the merge job. |
Dataset1 on MYSQL:
Import Dataset1 to HDFS:
mano@Mano:~$ sqoop-import --connect jdbc:mysql://localhost/sqoop_test --username root --password root --table students -m 1 --target-dir '/MANO/sqoop_merge/dataset1/import' --outdir javafiles
Dataset2 on MYSQL:
Import Dataset2 to HDFS:
mano@Mano:~$ sqoop-import --connect jdbc:mysql://localhost/sqoop_test --username root --password root --table students --where "id in (5,8)" -m 1 --target-dir '/MANO/sqoop_merge/dataset2/import' --outdir javafiles
Merge datasets:
Syntax:
sqoop-merge --merge-key primary_key \
--new-date path \
--onto path \
--target-dir path \
--class-file class \
--jar-file <we get from last_import>
mano@Mano:~$ sqoop-merge --merge-key id --new-data /MANO/sqoop_merge/dataset2/import --onto /MANO/sqoop_merge/dataset1/import --target-dir /MANO/sqoop_merge/merge/dataset --class-name students --jar-file /tmp/sqoop-mano/compile/f688439e671117f44bec5f7cec98ae28/students.jar
Final merge dataset:
Please follow the link for further ==>Sqoop_Page10
Does the Sqoop merge works on Hive meta store as well?
ReplyDelete