Big Confusion about Big Data...Understand terms like; Hive , Hadoop , HBase , Sqoop....

Hira Jha

Talent Acquisition Specialist @ BNY

Published Oct 19, 2015

Dear Folks !!!

People often confuse what exactly the term “Big Data” is, What is "Hadoop"

First, it’s important to define what Big Data is. First of all, I refer to Big Data to mean the data itself – although it is often used interchangeably with the solutions (such as Hadoop). Big Data is not a Technology but the collection of Large volume of Structured or Unstructured Data. I believe that data should satisfy 3 criteria before being considered “Big Data”:

Volume – the amount of data has to be large, in petabytes not just gigabytes
Velocity – the data has to be frequent, daily or even real-time
Structure – the data is typically but not always unstructured (like videos, tweets, chats)

Now understand the terms which we commonly see in a profile for Big data

Hadoop: Apache Hadoop is an excellent framework for processing, storing and analyzing large volumes of unstructured data - aka Big Data.

Hadoop Distributed File System: HDFS, the storage layer of Hadoop, is a distributed, scalable, Java-based file system adept at storing large volumes of unstructured data.

Hive: Hive is a Hadoop-based data warehousing-like framework originally developed by Facebook. It allows users to write queries in a SQL-like language caled HiveQL, which are then converted to MapReduce.

MapReduce: MapReduce is a software framework that serves as the compute layer of Hadoop. MapReduce jobs are divided into two (obviously named) parts. The “Map” function divides a query into multiple parts and processes data at the node level. The “Reduce” function aggregates the results of the “Map” function to determine the “answer” to the query.

Flume: Flume is a framework for populating Hadoop with data.Agents are populated throughout ones IT infrastructure – inside web servers, application servers and mobile devices, for example – to collect data and integrate it into Hadoop

Sqoop: Sqoop is a connectivity tool for moving data from non-Hadoop data stores – such as relational databases and data warehouses – into Hadoop. It allows users to specify the target location inside of Hadoop and instruct Sqoop to move data from Oracle, Teradata or other relational databases to the target.

Mahout: Mahout is a data mining library. It takes the most popular data mining algorithms for performing clustering, regression testing and statistical modeling and implements them using the Map Reduce model.

Thanks...

Hira Jha/-

To view or add a comment, sign in

Big Confusion about Big Data...Understand terms like; Hive , Hadoop , HBase , Sqoop....

Hira Jha

Talent Acquisition Specialist @ BNY

More articles by Hira Jha

Insights from the community

Others also viewed

Technology adoptions for data processing and analysis

Hadoop Ecosystem

Understanding Hadoop: Powering Big Data Processing and Analytics

SQL-on-Hadoop, Datastore..object, file....

Some Essential Tools of the Hadoop Ecosystem

Creating Custom Hadoop Events in EventBridge on AWS

How to build a datawarehouse on Hadoop

Big Data & Hadoop Overview

Adoption of Data Ingestion phase for migrating enterprise data into Hadoop Data Lake

Some Essential Tools of the Hadoop Ecosystem

Explore topics

More articles by Hira Jha

Interview No Show !!! Here is the solution :) Please read the complete article...

Recruiters Are one of the best Seller... Isn't it??

The Greatest Hiring Mistakes Founders Make..

"We will get back to you"..."The opening has already been filled"..."Position is on hold"

How to Explain your Strengths & Weaknesses

"Don't stay in a company for long"...Quite strange to hear this!!! but it is true..

"Great Employees Are Not Replaceable"

To avoid losing candidates through the hiring process:

Find new ways to track your candidates apart from the usual CV sourcing through job portals and profile sourcing through LinkedIn

Insights from the community

Others also viewed

Technology adoptions for data processing and analysis

Hadoop Ecosystem

Understanding Hadoop: Powering Big Data Processing and Analytics

SQL-on-Hadoop, Datastore..object, file....

Some Essential Tools of the Hadoop Ecosystem

Creating Custom Hadoop Events in EventBridge on AWS

How to build a datawarehouse on Hadoop

Big Data & Hadoop Overview

Adoption of Data Ingestion phase for migrating enterprise data into Hadoop Data Lake

Some Essential Tools of the Hadoop Ecosystem

Explore topics