Hive ¶
Apache Hive
is open-source data warehouse software designed to read, write, and manage large datasets extracted from the Apache Hadoop Distributed File System (HDFS) , one aspect of a larger Hadoop Ecosystem.
Hive processing support
Apache Hive supports the analysis of large datasets stored in Hadoop's HDFS and compatible file systems such as Amazon S3, Azure Blob Storage, Azure Data Lake Storage, Google Cloud Storage etc.
It provides a SQL-like query language called HiveQL
with schema-on-read and transparently converts queries to Apache Spark jobs, MapReduce job, and Apache Tez jobs.
The Hive Metastore¶
The central repository of the Apache Hive infrastructure, the metastore is where all of the Hive’s metadata is stored. In the metastore, metadata can also be formatted into Hive tables and partitions to compare data across relational databases.