Hive

Hive

Hive is a data warehouse infrastructure tool to process structured data in Hadoop. It resides on top of Hadoop to summarize Big Data, and makes querying and analyzing easy.

Initially Hive was developed by Facebook, later the Apache Software Foundation took it up and developed it further as an open source under the name Apache Hive. It is used by different companies. For example, Amazon uses it in Amazon Elastic MapReduce.

Hive's Features

These are Hive's chief characteristics:

  • Hive is designed for querying and managing only structured data stored in tables
  • Hive is scalable, fast, and uses familiar concepts
  • Schema gets stored in a database, while processed data goes into a Hadoop Distributed File System (HDFS)
  • Tables and databases get created first; then data gets loaded into the proper tables
  • Hive supports four file formats: ORC, SEQUENCEFILE, RCFILE (Record Columnar File), and TEXTFILE
  • Hive uses an SQL-inspired language, sparing the user from dealing with the complexity of MapReduce programming. It makes learning more accessible by utilizing familiar concepts found in relational databases, such as columns, tables, rows, and schema, etc.
  • The most significant difference between the Hive Query Language (HQL) and SQL is that Hive executes queries on Hadoop's infrastructure instead of on a traditional database
  • Since Hadoop's programming works on flat files, Hive uses directory structures to "partition" data, improving performance on specific queries.

Limitations of Hive

Of course, no resource is perfect, and Hive has some limitations. They are:

  • Hive doesn’t support OLTP. Hive supports Online Analytical Processing (OLAP), but not Online Transaction Processing (OLTP).
  • It doesn’t support subqueries.
  • It has a high latency.
  • Hive tables don’t support delete or update operations.

To view or add a comment, sign in

More articles by Dipti Goyal

  • Scrum Master

    A Scrum Master is a facilitator who guides a team using the Scrum framework, a lightweight agile methodology. They act…

  • Treasury

    Treasury refers to a department or area responsible for managing and controlling an organization's financial resources,…

  • Functional Requirements Document

    FRD stands for Functional Requirements Document. It's a key document in software development that defines how a system…

  • Business Requirements Document

    A BRD, or Business Requirements Document, is a formal document that outlines the goals, objectives, and requirements of…

  • Application Programming Interface

    An API, or Application Programming Interface, is a set of rules and protocols that allows different software…

  • Liquidity Risk

    Liquidity risk is the risk that an individual, company, or financial institution will be unable to meet its short-term…

  • Selenium

    Selenium is an open-source framework used for automating web browsers, primarily for testing web applications. It…

  • Angular

    Angular is a comprehensive, open-source front-end framework for building client-side web applications, primarily used…

  • Edge Compuitng

    Edge computing is a distributed computing framework that brings processing power and storage closer to the source of…

  • Business Requirements Document(BRD)

    A Business Requirements Document (BRD) is a formal document that clearly defines the goals, objectives, and…

Insights from the community

Others also viewed

Explore topics