Hive

Dipti Goyal

Associate Project Manager

Published Jun 13, 2022

Hive is a data warehouse infrastructure tool to process structured data in Hadoop. It resides on top of Hadoop to summarize Big Data, and makes querying and analyzing easy.

Initially Hive was developed by Facebook, later the Apache Software Foundation took it up and developed it further as an open source under the name Apache Hive. It is used by different companies. For example, Amazon uses it in Amazon Elastic MapReduce.

Hive's Features

These are Hive's chief characteristics:

Hive is designed for querying and managing only structured data stored in tables
Hive is scalable, fast, and uses familiar concepts
Schema gets stored in a database, while processed data goes into a Hadoop Distributed File System (HDFS)
Tables and databases get created first; then data gets loaded into the proper tables
Hive supports four file formats: ORC, SEQUENCEFILE, RCFILE (Record Columnar File), and TEXTFILE
Hive uses an SQL-inspired language, sparing the user from dealing with the complexity of MapReduce programming. It makes learning more accessible by utilizing familiar concepts found in relational databases, such as columns, tables, rows, and schema, etc.
The most significant difference between the Hive Query Language (HQL) and SQL is that Hive executes queries on Hadoop's infrastructure instead of on a traditional database
Since Hadoop's programming works on flat files, Hive uses directory structures to "partition" data, improving performance on specific queries.

Limitations of Hive

Of course, no resource is perfect, and Hive has some limitations. They are:

Hive doesn’t support OLTP. Hive supports Online Analytical Processing (OLAP), but not Online Transaction Processing (OLTP).
It doesn’t support subqueries.
It has a high latency.
Hive tables don’t support delete or update operations.

To view or add a comment, sign in

Hive

Dipti Goyal

Associate Project Manager

Hive's Features

Limitations of Hive

More articles by Dipti Goyal

Insights from the community

Others also viewed

What is HIVE?

Hive

Hadoop vs. Apache Spark: Which One Should You Use?

Big Data Ecosystem Keywords

Introduction to Hive

SPARK VS HADOOP

Is Apache Spark going to replace Hadoop?

Hive - A Fast and Powerful Data Warehouse Solution-Part 01

Analyzing data with Hadoop

Why Is Impala Faster Than Hive?

Explore topics

Hive's Features

Limitations of Hive

More articles by Dipti Goyal

Scrum Master

Treasury

Functional Requirements Document

Business Requirements Document

Application Programming Interface

Liquidity Risk

Selenium

Angular

Edge Compuitng

Business Requirements Document(BRD)

Insights from the community

Others also viewed

What is HIVE?

Hive

Hadoop vs. Apache Spark: Which One Should You Use?

Big Data Ecosystem Keywords

Introduction to Hive

SPARK VS HADOOP

Is Apache Spark going to replace Hadoop?

Hive - A Fast and Powerful Data Warehouse Solution-Part 01

Analyzing data with Hadoop

Why Is Impala Faster Than Hive?

Explore topics