Apache Hive:
The Apache Hive is a data warehouse software that is built on top of Apache Hadoop for data analysis., that facilitates reading, writing, and managing large data sets residing in distributed storage(HDFS).
Note:
Hive provides a mechanism to impose structure for a variety of data formats on Hadoop and to query that data using a SQL-like language called HiveQL (HQL).
Apache Hive provides the following features:
WebHCat provides a service that you can use to run Hadoop MapReduce (or YARN), Pig, Hive jobs or perform Hive metadata operations using an HTTP (REST style) interface.
Hive Execution engines and properties:
There are currently three execution engines , following are
1.Defualt MapReduce engine,
The Apache Hive is a data warehouse software that is built on top of Apache Hadoop for data analysis., that facilitates reading, writing, and managing large data sets residing in distributed storage(HDFS).
Note:
Hive provides a mechanism to impose structure for a variety of data formats on Hadoop and to query that data using a SQL-like language called HiveQL (HQL).
Hive was originated in Facebook.
- Hive tools to enable easy access to data via SQL interface, thus enabling data warehousing tasks such as extract/transform/load (ETL), reporting, and data analysis.
- A mechanism to impose structure on a variety of data formats
- Hive access to files stored either directly in Apache HDFS™ or in other data storage systems such as Apache HBase™
- Hive Query execution via Apache Tez™, Apache Spark™, or MapReduce(Default)
Limitations of Hive:
• Hive is not designed for Online transaction processing (OLTP ), it is only used for the Online Analytical Processing.
• Hive supports overwriting data, but not updates and deletes.
Hive is used inspite of Pig?
1)HCatalog:
HCatalog is a table and storage management layer for Hadoop that enables users with different data processing tools — Pig, MapReduce — to more easily read and write data on the grid.
2)WebHCat:
• Hive is not designed for Online transaction processing (OLTP ), it is only used for the Online Analytical Processing.
• Hive supports overwriting data, but not updates and deletes.
Hive is used inspite of Pig?
- Hive-QL is a declarative language like SQL, PigLatin is a data flow language.
- Pig: a data-flow language and environment for exploring very large datasets.
- Hive: a distributed data warehouse.
1)HCatalog:
HCatalog is a table and storage management layer for Hadoop that enables users with different data processing tools — Pig, MapReduce — to more easily read and write data on the grid.
2)WebHCat:
WebHCat provides a service that you can use to run Hadoop MapReduce (or YARN), Pig, Hive jobs or perform Hive metadata operations using an HTTP (REST style) interface.
Hive Execution engines and properties:
There are currently three execution engines , following are
1.Defualt MapReduce engine,
hive.execution.engine=mr
2.TEZ engine,
set hive.execution.engine=tez;
3.Spark engine
set hive.execution.engine=spark;Please click next to proceed further ==> Next page
Very niceblog,keep Updating.
ReplyDeleteThank you for info..
big data hadoop course
Thanks for the useful information.Keep sharing.
ReplyDeleteThanks for the useful information.keep sharing.
Machine Learning training in Pallikranai Chennai
Data science training in Pallikaranai
Python Training in Pallikaranai chennai
Bigdata training in Pallikaranai chennai
Spark with ML training in Pallikaranai chennai
what is microsoft azure
ReplyDeleteazure free trial account
azure adf
azure data factory interview questions
azure certification path
azure traffic manager