Introduction:
What is Sqoop?
Sqoop is a tool which is designed to transfer data between Hadoop and relational databases or mainframes.
Intense of Sqoop Usage:
Highlights of Sqoop:
Sqoop Basic Usage:
i)Import process:
We can import data from a relational database system or a mainframe into HDFS. The input to the import process is either database table or mainframe datasets.
Output of Import process:
The output of this import process is a set of files containing a copy of the imported table or datasets.
Note:
The import process is performed in parallel. For this reason, the output will be in multiple files.
So, after import process, we perform the processing logic using MapReduce or Hive etc.,
ii)Export process:
Sqoop’s export process will read a set of delimited text files from HDFS in parallel, parse them into records, and insert them as new rows in a target database table, for consumption by external applications or users.
Please follow the link for further ==> Sqoop_Page 2
What is Sqoop?
Sqoop is a tool which is designed to transfer data between Hadoop and relational databases or mainframes.
Intense of Sqoop Usage:
- Using Sqoop to import data from a relational database management system (RDBMS) such as MySQL or Oracle or a mainframe into the Hadoop Distributed File System (HDFS),
- Transform the data in Hadoop MapReduce(Processing data based on conditions/constraints) ==> Yields analyzed data
- and then export the analyzed data/any data back into an RDBMS.
Highlights of Sqoop:
- Sqoop automates most of the process
- Relying on the database to describe the schema for the data to be imported.
- Sqoop uses MapReduce to import and export the data, which provides parallel operation as well as fault tolerance :).
Sqoop Basic Usage:
i)Import process:
We can import data from a relational database system or a mainframe into HDFS. The input to the import process is either database table or mainframe datasets.
- For databases, Sqoop will read the table row-by-row into HDFS of hadoop.
- For mainframe datasets, Sqoop will read records from each mainframe dataset into HDFS of hadoop.
The output of this import process is a set of files containing a copy of the imported table or datasets.
Note:
The import process is performed in parallel. For this reason, the output will be in multiple files.
So, after import process, we perform the processing logic using MapReduce or Hive etc.,
ii)Export process:
Sqoop’s export process will read a set of delimited text files from HDFS in parallel, parse them into records, and insert them as new rows in a target database table, for consumption by external applications or users.
Please follow the link for further ==> Sqoop_Page 2
It is nice blog Thank you provide important information and i am searching for same information to save my time Big Data Hadoop Online Training
ReplyDelete