SlideShare a Scribd company logo
Apache Hadoop HBASE
Sheetal Sharma
Intern At IBM Innovation Centre
HBase is ..
● A distributed data store that can scale horizontally to
1,000s of commodity servers and petabytes of indexed
storage.
● Designed to operate on top of the Hadoop distributed file
system (HDFS) or Kosmos File System (KFS, aka
Cloudstore) for scalability, fault tolerance, and high
availability.
Benefits
● Distributed storage
● Table-like in data structure
multi-dimensional map
● High scalability
● High availability
● High performance
HBase Is Not …
● Tables have one primary index, the row key.
● No join operators.
● Scans and queries can select a subset of available
columns, perhaps by using a wildcard.
● There are three types of lookups:
Fast lookup using row key and optional timestamp.
Full table scan
Range scan from region start to end.
HBase Is Not …(2)
● Limited atomicity and transaction support.
- HBase supports multiple batched mutations of single
rows only.
- Data is unstructured and untyped.
● No accessed or manipulated via SQL.
- Programmatic access via Java, REST, or Thrift APIs.
- Scripting via JRuby.
Why HBase ?
● HBase is a Bigtable clone.
● It is open source
● It has a good community and promise for the
future
● It is developed on top of and has good integration
for the Hadoop platform, if you are using Hadoop
already.
● It has a Cascading connector.
When to use HBase
HBase benefits than RDBMS
● No real indexes
● Automatic partitioning
● Scale linearly and automatically with new nodes
● Commodity hardware
● Fault tolerance
● Batch processing
HBase: Part of Hadoop’s
Ecosystem
HBase is built on top of HDFS
HBase files are
internally stored
in HDFS
HBase vs. HDFS
● Both are distributed systems that scale to hundreds or thousands
of nodes
● HDFS is good for batch processing (scans over big files)
Not good for record lookup
Not good for incremental addition of small batches
Not good for updates
HBase vs. HDFS (Cont’d)
● HBase is designed to efficiently address the above points
Fast record lookup
Support for record-level insertion
Support for updates (not in place)
● HBase updates are done by creating new versions of values
HBase vs. HDFS (Cont’d)
If application has neither random reads or writes  Stick to HDFS
HBase vs. RDBMS
HBase Data Model
● Data is divided into various tables
● Table is composed of columns, columns are grouped into column-
families
HBase Storage Model
● Partitioning
- A table is horizontally partitioned into regions, each region is
composed of sequential range of keys
- Each region is managed by a RegionServer, a single
RegionServer may hold multiple regions
●
Persistence and data availability
- HBase stores its data in HDFS, it doesn't replicate
RegionServers and relies on HDFS replication for data
availability.
- Region data is cached in-memory
* Updates and reads are served from in-memory
cache (MemStore)
* MemStore is flushed periodically to HDFS
* Write Ahead Log (stored in HDFS) is used for
durability of updates
HBase: Keys and Column
Families
Each record is divided into Column Families
Each row has a Key
Each column family consists of one or more Columns
Row key
Time
Stamp
Column
“ content
s:”
Column “ anchor:”
“ com.apac
he.ww
w”
t12
“ <html>
…”
t11
“ <html>
…”
t10
“ anchor:apache
.com”
“ APACH
E”
“ com.cnn.w
ww”
t15
“ anchor:cnnsi.co
m”
“ CNN”
t13
“ anchor:my.look.
ca”
“ CNN.co
m”
t6
“ <html>
…”
t5
“ <html>
…”
t3
“ <html>
…”
•
Key
• Byte array
• Serves as the primary
key for the table
• Indexed far fast lookup
•
Column Family
• Has a name (string)
• Contains one or more
related columns
•
Column
• Belongs to one column
family
• Included inside the row
•
familyName:columnNa
me
Column family named “Contents”
Column family named “anchor”
Column named “apache.com”
Row key
Time
Stamp
Column
“ content
s:”
Column “ anchor:”
“ com.apac
he.ww
w”
t12
“ <html>
…”
t11
“ <html>
…”
t10
“ anchor:apache
.com”
“ APACH
E”
“ com.cnn.w
ww”
t15
“ anchor:cnnsi.co
m”
“ CNN”
t13
“ anchor:my.look.
ca”
“ CNN.co
m”
t6
“ <html>
…”
t5
“ <html>
…”
t3
“ <html>
…”
•
Version Number
• Unique within each
key
• By default System’s
timestamp
• Data type is Long
•
Value (Cell)
• Byte array
Version number for each row
value
HBase Architecture
Three Major Components
•
The HBaseMaster
• One master
•
The HRegionServer
• Many region servers
•
The HBase client
HBase Components
•
Region
• A subset of a table’s rows, like horizontal range partitioning
• Automatically done
•
RegionServer (many slaves)
• Manages data regions
• Serves data for reads and writes (using a log)
•
Master
• Responsible for coordinating the slaves
• Assigns regions, detects failures
• Admin functions
Big Picture
ZooKeeper
•
HBase depends on
ZooKeeper
•
By default HBase manages
the ZooKeeper instance
• E.g., starts and stops
ZooKeeper
•
HMaster and HRegionServers
register themselves with
ZooKeeper
Creating a Table
HBaseAdmin admin= new HBaseAdmin(config);
HColumnDescriptor []column;
column= new HColumnDescriptor[2];
column[0]=new HColumnDescriptor("columnFamily1:");
column[1]=new HColumnDescriptor("columnFamily2:");
HTableDescriptor desc= new
HTableDescriptor(Bytes.toBytes("MyTable"));
desc.addFamily(column[0]);
desc.addFamily(column[1]);
admin.createTable(desc);
Operations On Regions: Get()
•
Given a key  return corresponding record
•
For each value return the highest version
● Can control the number of versions you want
Get() Select value from table where
key=‘com.apache.www’ AND
label=‘anchor:apache.com’
Row key
Time
Stamp
Column “anchor:”
“com.apache.www”
t12
t11
t10 “anchor:apache.com” “APACHE”
“com.cnn.www”
t9 “anchor:cnnsi.com” “CNN”
t8 “anchor:my.look.ca” “CNN.com”
t6
t5
t3
Operations On Regions: Scan()
Scan()
Select value from table
where anchor=‘cnnsi.com’
Row key
Time
Stamp
Column “anchor:”
“com.apache.www”
t12
t11
t10 “anchor:apache.com” “APACHE”
“com.cnn.www”
t9 “anchor:cnnsi.com” “CNN”
t8 “anchor:my.look.ca” “CNN.com”
t6
t5
t3
Operations On Regions: Put()
● Insert a new record (with a new key), Or
● Insert a record for an existing key
Implicit version number
(timestamp)
Explicit version number
Operations On Regions: Delete()
•
Marking table cells as deleted
•
Multiple levels
• Can mark an entire column family as deleted
• Can make all column families of a given row as deleted
•
All operations are logged by the RegionServers
•
The log is flushed periodically
Altering a Table
Disable the table before changing the schema
Logging Operations
HBase Deployment
Master
node
Slave
nodes
References
● Introduction to Hbase
trac.nchc.org.tw/cloud/raw-
attachment/wiki/.../hbase_intro.ppt
● web.cs.wpi.edu/~cs525/s13-MYE/lectures/5/HBase.pptx
● www-users.cselabs.umn.edu/classes/Spring.../Hadoop-HBase-
Tutorial.ppt
● www.cs.kent.edu/~jin/Cloud12Spring/HbaseHivePig.pptx
Apache hadoop hbase
Ad

More Related Content

What's hot (20)

7. Key-Value Databases: In Depth
7. Key-Value Databases: In Depth7. Key-Value Databases: In Depth
7. Key-Value Databases: In Depth
Fabio Fumarola
 
Introduction to HBase
Introduction to HBaseIntroduction to HBase
Introduction to HBase
Avkash Chauhan
 
Impala presentation
Impala presentationImpala presentation
Impala presentation
trihug
 
Chicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionChicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An Introduction
Cloudera, Inc.
 
Hadoop Architecture
Hadoop ArchitectureHadoop Architecture
Hadoop Architecture
Dr. C.V. Suresh Babu
 
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
Simplilearn
 
Apache HBase Improvements and Practices at Xiaomi
Apache HBase Improvements and Practices at XiaomiApache HBase Improvements and Practices at Xiaomi
Apache HBase Improvements and Practices at Xiaomi
HBaseCon
 
Key-Value NoSQL Database
Key-Value NoSQL DatabaseKey-Value NoSQL Database
Key-Value NoSQL Database
Heman Hosainpana
 
Hadoop Ecosystem
Hadoop EcosystemHadoop Ecosystem
Hadoop Ecosystem
Sandip Darwade
 
Hadoop And Their Ecosystem ppt
 Hadoop And Their Ecosystem ppt Hadoop And Their Ecosystem ppt
Hadoop And Their Ecosystem ppt
sunera pathan
 
HBase
HBaseHBase
HBase
Pooja Sunkapur
 
Hadoop Overview kdd2011
Hadoop Overview kdd2011Hadoop Overview kdd2011
Hadoop Overview kdd2011
Milind Bhandarkar
 
Session 14 - Hive
Session 14 - HiveSession 14 - Hive
Session 14 - Hive
AnandMHadoop
 
What are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
What are Hadoop Components? Hadoop Ecosystem and Architecture | EdurekaWhat are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
What are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
Edureka!
 
Hadoop and Spark
Hadoop and SparkHadoop and Spark
Hadoop and Spark
Shravan (Sean) Pabba
 
Apache Hadoop
Apache HadoopApache Hadoop
Apache Hadoop
Ajit Koti
 
Apache Hadoop and HBase
Apache Hadoop and HBaseApache Hadoop and HBase
Apache Hadoop and HBase
Cloudera, Inc.
 
introduction to NOSQL Database
introduction to NOSQL Databaseintroduction to NOSQL Database
introduction to NOSQL Database
nehabsairam
 
Optimizing Hive Queries
Optimizing Hive QueriesOptimizing Hive Queries
Optimizing Hive Queries
Owen O'Malley
 
Hadoop Backup and Disaster Recovery
Hadoop Backup and Disaster RecoveryHadoop Backup and Disaster Recovery
Hadoop Backup and Disaster Recovery
Cloudera, Inc.
 
7. Key-Value Databases: In Depth
7. Key-Value Databases: In Depth7. Key-Value Databases: In Depth
7. Key-Value Databases: In Depth
Fabio Fumarola
 
Impala presentation
Impala presentationImpala presentation
Impala presentation
trihug
 
Chicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionChicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An Introduction
Cloudera, Inc.
 
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
Simplilearn
 
Apache HBase Improvements and Practices at Xiaomi
Apache HBase Improvements and Practices at XiaomiApache HBase Improvements and Practices at Xiaomi
Apache HBase Improvements and Practices at Xiaomi
HBaseCon
 
Hadoop And Their Ecosystem ppt
 Hadoop And Their Ecosystem ppt Hadoop And Their Ecosystem ppt
Hadoop And Their Ecosystem ppt
sunera pathan
 
What are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
What are Hadoop Components? Hadoop Ecosystem and Architecture | EdurekaWhat are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
What are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
Edureka!
 
Apache Hadoop
Apache HadoopApache Hadoop
Apache Hadoop
Ajit Koti
 
Apache Hadoop and HBase
Apache Hadoop and HBaseApache Hadoop and HBase
Apache Hadoop and HBase
Cloudera, Inc.
 
introduction to NOSQL Database
introduction to NOSQL Databaseintroduction to NOSQL Database
introduction to NOSQL Database
nehabsairam
 
Optimizing Hive Queries
Optimizing Hive QueriesOptimizing Hive Queries
Optimizing Hive Queries
Owen O'Malley
 
Hadoop Backup and Disaster Recovery
Hadoop Backup and Disaster RecoveryHadoop Backup and Disaster Recovery
Hadoop Backup and Disaster Recovery
Cloudera, Inc.
 

Similar to Apache hadoop hbase (20)

HBASE, HIVE , ARCHITECTURE AND WORKING EXAMPLES
HBASE, HIVE , ARCHITECTURE AND WORKING EXAMPLESHBASE, HIVE , ARCHITECTURE AND WORKING EXAMPLES
HBASE, HIVE , ARCHITECTURE AND WORKING EXAMPLES
harikumar288574
 
CCS334 BIG DATA ANALYTICS UNIT 5 PPT ELECTIVE PAPER
CCS334 BIG DATA ANALYTICS UNIT 5 PPT  ELECTIVE PAPERCCS334 BIG DATA ANALYTICS UNIT 5 PPT  ELECTIVE PAPER
CCS334 BIG DATA ANALYTICS UNIT 5 PPT ELECTIVE PAPER
KrishnaVeni451953
 
Hbase 20141003
Hbase 20141003Hbase 20141003
Hbase 20141003
Jean-Baptiste Poullet
 
Hbase
HbaseHbase
Hbase
AllsoftSolutions
 
HBase.pptx
HBase.pptxHBase.pptx
HBase.pptx
vijayapraba1
 
01 hbase
01 hbase01 hbase
01 hbase
Subhas Kumar Ghosh
 
Hbase: an introduction
Hbase: an introductionHbase: an introduction
Hbase: an introduction
Jean-Baptiste Poullet
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
DataWorks Summit/Hadoop Summit
 
Hadoop - Apache Hbase
Hadoop - Apache HbaseHadoop - Apache Hbase
Hadoop - Apache Hbase
Vibrant Technologies & Computers
 
HBase in Practice
HBase in PracticeHBase in Practice
HBase in Practice
larsgeorge
 
HBase.pptx
HBase.pptxHBase.pptx
HBase.pptx
Sadhik7
 
Unit II Hadoop Ecosystem_Updated.pptx
Unit II Hadoop Ecosystem_Updated.pptxUnit II Hadoop Ecosystem_Updated.pptx
Unit II Hadoop Ecosystem_Updated.pptx
BhavanaHotchandani
 
Introduction to HBase
Introduction to HBaseIntroduction to HBase
Introduction to HBase
Byeongweon Moon
 
HBASE Overview
HBASE OverviewHBASE Overview
HBASE Overview
Sampath Rachakonda
 
支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统
yongboy
 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase
强 王
 
Facebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconFacebook keynote-nicolas-qcon
Facebook keynote-nicolas-qcon
Yiwei Ma
 
BDA: Introduction to HIVE, PIG and HBASE
BDA: Introduction to HIVE, PIG and HBASEBDA: Introduction to HIVE, PIG and HBASE
BDA: Introduction to HIVE, PIG and HBASE
tripathineeharika
 
Hive - A theoretical overview in Detail.pptx
Hive - A theoretical overview in Detail.pptxHive - A theoretical overview in Detail.pptx
Hive - A theoretical overview in Detail.pptx
Mithun DSouza
 
Introduction to Apache HBase
Introduction to Apache HBaseIntroduction to Apache HBase
Introduction to Apache HBase
Gokuldas Pillai
 
Ad

More from sheetal sharma (9)

Db import&amp;export
Db import&amp;exportDb import&amp;export
Db import&amp;export
sheetal sharma
 
Db import&amp;export
Db import&amp;exportDb import&amp;export
Db import&amp;export
sheetal sharma
 
Hadoop Introduction
Hadoop IntroductionHadoop Introduction
Hadoop Introduction
sheetal sharma
 
Apache hadoop
Apache hadoopApache hadoop
Apache hadoop
sheetal sharma
 
Apache hive1
Apache hive1Apache hive1
Apache hive1
sheetal sharma
 
Telecommunication Analysis (3 use-cases) with IBM watson analytics
Telecommunication Analysis (3 use-cases) with IBM watson analyticsTelecommunication Analysis (3 use-cases) with IBM watson analytics
Telecommunication Analysis (3 use-cases) with IBM watson analytics
sheetal sharma
 
Telecommunication Analysis(3 use-cases) with IBM cognos insight
Telecommunication Analysis(3 use-cases) with IBM cognos insightTelecommunication Analysis(3 use-cases) with IBM cognos insight
Telecommunication Analysis(3 use-cases) with IBM cognos insight
sheetal sharma
 
Sentiment Analysis App with DevOps Services
Sentiment Analysis App with DevOps ServicesSentiment Analysis App with DevOps Services
Sentiment Analysis App with DevOps Services
sheetal sharma
 
Watson analytics
Watson analyticsWatson analytics
Watson analytics
sheetal sharma
 
Telecommunication Analysis (3 use-cases) with IBM watson analytics
Telecommunication Analysis (3 use-cases) with IBM watson analyticsTelecommunication Analysis (3 use-cases) with IBM watson analytics
Telecommunication Analysis (3 use-cases) with IBM watson analytics
sheetal sharma
 
Telecommunication Analysis(3 use-cases) with IBM cognos insight
Telecommunication Analysis(3 use-cases) with IBM cognos insightTelecommunication Analysis(3 use-cases) with IBM cognos insight
Telecommunication Analysis(3 use-cases) with IBM cognos insight
sheetal sharma
 
Sentiment Analysis App with DevOps Services
Sentiment Analysis App with DevOps ServicesSentiment Analysis App with DevOps Services
Sentiment Analysis App with DevOps Services
sheetal sharma
 
Ad

Recently uploaded (20)

GyrusAI - Broadcasting & Streaming Applications Driven by AI and ML
GyrusAI - Broadcasting & Streaming Applications Driven by AI and MLGyrusAI - Broadcasting & Streaming Applications Driven by AI and ML
GyrusAI - Broadcasting & Streaming Applications Driven by AI and ML
Gyrus AI
 
Mastering Testing in the Modern F&B Landscape
Mastering Testing in the Modern F&B LandscapeMastering Testing in the Modern F&B Landscape
Mastering Testing in the Modern F&B Landscape
marketing943205
 
Agentic Automation - Delhi UiPath Community Meetup
Agentic Automation - Delhi UiPath Community MeetupAgentic Automation - Delhi UiPath Community Meetup
Agentic Automation - Delhi UiPath Community Meetup
Manoj Batra (1600 + Connections)
 
DevOpsDays SLC - Platform Engineers are Product Managers.pptx
DevOpsDays SLC - Platform Engineers are Product Managers.pptxDevOpsDays SLC - Platform Engineers are Product Managers.pptx
DevOpsDays SLC - Platform Engineers are Product Managers.pptx
Justin Reock
 
Zilliz Cloud Monthly Technical Review: May 2025
Zilliz Cloud Monthly Technical Review: May 2025Zilliz Cloud Monthly Technical Review: May 2025
Zilliz Cloud Monthly Technical Review: May 2025
Zilliz
 
Everything You Need to Know About Agentforce? (Put AI Agents to Work)
Everything You Need to Know About Agentforce? (Put AI Agents to Work)Everything You Need to Know About Agentforce? (Put AI Agents to Work)
Everything You Need to Know About Agentforce? (Put AI Agents to Work)
Cyntexa
 
Canadian book publishing: Insights from the latest salary survey - Tech Forum...
Canadian book publishing: Insights from the latest salary survey - Tech Forum...Canadian book publishing: Insights from the latest salary survey - Tech Forum...
Canadian book publishing: Insights from the latest salary survey - Tech Forum...
BookNet Canada
 
Enterprise Integration Is Dead! Long Live AI-Driven Integration with Apache C...
Enterprise Integration Is Dead! Long Live AI-Driven Integration with Apache C...Enterprise Integration Is Dead! Long Live AI-Driven Integration with Apache C...
Enterprise Integration Is Dead! Long Live AI-Driven Integration with Apache C...
Markus Eisele
 
Challenges in Migrating Imperative Deep Learning Programs to Graph Execution:...
Challenges in Migrating Imperative Deep Learning Programs to Graph Execution:...Challenges in Migrating Imperative Deep Learning Programs to Graph Execution:...
Challenges in Migrating Imperative Deep Learning Programs to Graph Execution:...
Raffi Khatchadourian
 
UiPath Agentic Automation: Community Developer Opportunities
UiPath Agentic Automation: Community Developer OpportunitiesUiPath Agentic Automation: Community Developer Opportunities
UiPath Agentic Automation: Community Developer Opportunities
DianaGray10
 
Q1 2025 Dropbox Earnings and Investor Presentation
Q1 2025 Dropbox Earnings and Investor PresentationQ1 2025 Dropbox Earnings and Investor Presentation
Q1 2025 Dropbox Earnings and Investor Presentation
Dropbox
 
On-Device or Remote? On the Energy Efficiency of Fetching LLM-Generated Conte...
On-Device or Remote? On the Energy Efficiency of Fetching LLM-Generated Conte...On-Device or Remote? On the Energy Efficiency of Fetching LLM-Generated Conte...
On-Device or Remote? On the Energy Efficiency of Fetching LLM-Generated Conte...
Ivano Malavolta
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
GDG Cloud Southlake #42: Suresh Mathew: Autonomous Resource Optimization: How...
GDG Cloud Southlake #42: Suresh Mathew: Autonomous Resource Optimization: How...GDG Cloud Southlake #42: Suresh Mathew: Autonomous Resource Optimization: How...
GDG Cloud Southlake #42: Suresh Mathew: Autonomous Resource Optimization: How...
James Anderson
 
AI You Can Trust: The Critical Role of Governance and Quality.pdf
AI You Can Trust: The Critical Role of Governance and Quality.pdfAI You Can Trust: The Critical Role of Governance and Quality.pdf
AI You Can Trust: The Critical Role of Governance and Quality.pdf
Precisely
 
Does Pornify Allow NSFW? Everything You Should Know
Does Pornify Allow NSFW? Everything You Should KnowDoes Pornify Allow NSFW? Everything You Should Know
Does Pornify Allow NSFW? Everything You Should Know
Pornify CC
 
Automate Studio Training: Building Scripts for SAP Fiori and GUI for HTML.pdf
Automate Studio Training: Building Scripts for SAP Fiori and GUI for HTML.pdfAutomate Studio Training: Building Scripts for SAP Fiori and GUI for HTML.pdf
Automate Studio Training: Building Scripts for SAP Fiori and GUI for HTML.pdf
Precisely
 
UiPath Automation Suite – Cas d'usage d'une NGO internationale basée à Genève
UiPath Automation Suite – Cas d'usage d'une NGO internationale basée à GenèveUiPath Automation Suite – Cas d'usage d'une NGO internationale basée à Genève
UiPath Automation Suite – Cas d'usage d'une NGO internationale basée à Genève
UiPathCommunity
 
The No-Code Way to Build a Marketing Team with One AI Agent (Download the n8n...
The No-Code Way to Build a Marketing Team with One AI Agent (Download the n8n...The No-Code Way to Build a Marketing Team with One AI Agent (Download the n8n...
The No-Code Way to Build a Marketing Team with One AI Agent (Download the n8n...
SOFTTECHHUB
 
Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...
Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...
Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...
Raffi Khatchadourian
 
GyrusAI - Broadcasting & Streaming Applications Driven by AI and ML
GyrusAI - Broadcasting & Streaming Applications Driven by AI and MLGyrusAI - Broadcasting & Streaming Applications Driven by AI and ML
GyrusAI - Broadcasting & Streaming Applications Driven by AI and ML
Gyrus AI
 
Mastering Testing in the Modern F&B Landscape
Mastering Testing in the Modern F&B LandscapeMastering Testing in the Modern F&B Landscape
Mastering Testing in the Modern F&B Landscape
marketing943205
 
DevOpsDays SLC - Platform Engineers are Product Managers.pptx
DevOpsDays SLC - Platform Engineers are Product Managers.pptxDevOpsDays SLC - Platform Engineers are Product Managers.pptx
DevOpsDays SLC - Platform Engineers are Product Managers.pptx
Justin Reock
 
Zilliz Cloud Monthly Technical Review: May 2025
Zilliz Cloud Monthly Technical Review: May 2025Zilliz Cloud Monthly Technical Review: May 2025
Zilliz Cloud Monthly Technical Review: May 2025
Zilliz
 
Everything You Need to Know About Agentforce? (Put AI Agents to Work)
Everything You Need to Know About Agentforce? (Put AI Agents to Work)Everything You Need to Know About Agentforce? (Put AI Agents to Work)
Everything You Need to Know About Agentforce? (Put AI Agents to Work)
Cyntexa
 
Canadian book publishing: Insights from the latest salary survey - Tech Forum...
Canadian book publishing: Insights from the latest salary survey - Tech Forum...Canadian book publishing: Insights from the latest salary survey - Tech Forum...
Canadian book publishing: Insights from the latest salary survey - Tech Forum...
BookNet Canada
 
Enterprise Integration Is Dead! Long Live AI-Driven Integration with Apache C...
Enterprise Integration Is Dead! Long Live AI-Driven Integration with Apache C...Enterprise Integration Is Dead! Long Live AI-Driven Integration with Apache C...
Enterprise Integration Is Dead! Long Live AI-Driven Integration with Apache C...
Markus Eisele
 
Challenges in Migrating Imperative Deep Learning Programs to Graph Execution:...
Challenges in Migrating Imperative Deep Learning Programs to Graph Execution:...Challenges in Migrating Imperative Deep Learning Programs to Graph Execution:...
Challenges in Migrating Imperative Deep Learning Programs to Graph Execution:...
Raffi Khatchadourian
 
UiPath Agentic Automation: Community Developer Opportunities
UiPath Agentic Automation: Community Developer OpportunitiesUiPath Agentic Automation: Community Developer Opportunities
UiPath Agentic Automation: Community Developer Opportunities
DianaGray10
 
Q1 2025 Dropbox Earnings and Investor Presentation
Q1 2025 Dropbox Earnings and Investor PresentationQ1 2025 Dropbox Earnings and Investor Presentation
Q1 2025 Dropbox Earnings and Investor Presentation
Dropbox
 
On-Device or Remote? On the Energy Efficiency of Fetching LLM-Generated Conte...
On-Device or Remote? On the Energy Efficiency of Fetching LLM-Generated Conte...On-Device or Remote? On the Energy Efficiency of Fetching LLM-Generated Conte...
On-Device or Remote? On the Energy Efficiency of Fetching LLM-Generated Conte...
Ivano Malavolta
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
GDG Cloud Southlake #42: Suresh Mathew: Autonomous Resource Optimization: How...
GDG Cloud Southlake #42: Suresh Mathew: Autonomous Resource Optimization: How...GDG Cloud Southlake #42: Suresh Mathew: Autonomous Resource Optimization: How...
GDG Cloud Southlake #42: Suresh Mathew: Autonomous Resource Optimization: How...
James Anderson
 
AI You Can Trust: The Critical Role of Governance and Quality.pdf
AI You Can Trust: The Critical Role of Governance and Quality.pdfAI You Can Trust: The Critical Role of Governance and Quality.pdf
AI You Can Trust: The Critical Role of Governance and Quality.pdf
Precisely
 
Does Pornify Allow NSFW? Everything You Should Know
Does Pornify Allow NSFW? Everything You Should KnowDoes Pornify Allow NSFW? Everything You Should Know
Does Pornify Allow NSFW? Everything You Should Know
Pornify CC
 
Automate Studio Training: Building Scripts for SAP Fiori and GUI for HTML.pdf
Automate Studio Training: Building Scripts for SAP Fiori and GUI for HTML.pdfAutomate Studio Training: Building Scripts for SAP Fiori and GUI for HTML.pdf
Automate Studio Training: Building Scripts for SAP Fiori and GUI for HTML.pdf
Precisely
 
UiPath Automation Suite – Cas d'usage d'une NGO internationale basée à Genève
UiPath Automation Suite – Cas d'usage d'une NGO internationale basée à GenèveUiPath Automation Suite – Cas d'usage d'une NGO internationale basée à Genève
UiPath Automation Suite – Cas d'usage d'une NGO internationale basée à Genève
UiPathCommunity
 
The No-Code Way to Build a Marketing Team with One AI Agent (Download the n8n...
The No-Code Way to Build a Marketing Team with One AI Agent (Download the n8n...The No-Code Way to Build a Marketing Team with One AI Agent (Download the n8n...
The No-Code Way to Build a Marketing Team with One AI Agent (Download the n8n...
SOFTTECHHUB
 
Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...
Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...
Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...
Raffi Khatchadourian
 

Apache hadoop hbase

  • 1. Apache Hadoop HBASE Sheetal Sharma Intern At IBM Innovation Centre
  • 2. HBase is .. ● A distributed data store that can scale horizontally to 1,000s of commodity servers and petabytes of indexed storage. ● Designed to operate on top of the Hadoop distributed file system (HDFS) or Kosmos File System (KFS, aka Cloudstore) for scalability, fault tolerance, and high availability.
  • 3. Benefits ● Distributed storage ● Table-like in data structure multi-dimensional map ● High scalability ● High availability ● High performance
  • 4. HBase Is Not … ● Tables have one primary index, the row key. ● No join operators. ● Scans and queries can select a subset of available columns, perhaps by using a wildcard. ● There are three types of lookups: Fast lookup using row key and optional timestamp. Full table scan Range scan from region start to end.
  • 5. HBase Is Not …(2) ● Limited atomicity and transaction support. - HBase supports multiple batched mutations of single rows only. - Data is unstructured and untyped. ● No accessed or manipulated via SQL. - Programmatic access via Java, REST, or Thrift APIs. - Scripting via JRuby.
  • 6. Why HBase ? ● HBase is a Bigtable clone. ● It is open source ● It has a good community and promise for the future ● It is developed on top of and has good integration for the Hadoop platform, if you are using Hadoop already. ● It has a Cascading connector.
  • 7. When to use HBase
  • 8. HBase benefits than RDBMS ● No real indexes ● Automatic partitioning ● Scale linearly and automatically with new nodes ● Commodity hardware ● Fault tolerance ● Batch processing
  • 9. HBase: Part of Hadoop’s Ecosystem HBase is built on top of HDFS HBase files are internally stored in HDFS
  • 10. HBase vs. HDFS ● Both are distributed systems that scale to hundreds or thousands of nodes ● HDFS is good for batch processing (scans over big files) Not good for record lookup Not good for incremental addition of small batches Not good for updates
  • 11. HBase vs. HDFS (Cont’d) ● HBase is designed to efficiently address the above points Fast record lookup Support for record-level insertion Support for updates (not in place) ● HBase updates are done by creating new versions of values
  • 12. HBase vs. HDFS (Cont’d) If application has neither random reads or writes  Stick to HDFS
  • 14. HBase Data Model ● Data is divided into various tables ● Table is composed of columns, columns are grouped into column- families
  • 15. HBase Storage Model ● Partitioning - A table is horizontally partitioned into regions, each region is composed of sequential range of keys - Each region is managed by a RegionServer, a single RegionServer may hold multiple regions ● Persistence and data availability - HBase stores its data in HDFS, it doesn't replicate RegionServers and relies on HDFS replication for data availability. - Region data is cached in-memory * Updates and reads are served from in-memory cache (MemStore) * MemStore is flushed periodically to HDFS * Write Ahead Log (stored in HDFS) is used for durability of updates
  • 16. HBase: Keys and Column Families Each record is divided into Column Families Each row has a Key Each column family consists of one or more Columns
  • 17. Row key Time Stamp Column “ content s:” Column “ anchor:” “ com.apac he.ww w” t12 “ <html> …” t11 “ <html> …” t10 “ anchor:apache .com” “ APACH E” “ com.cnn.w ww” t15 “ anchor:cnnsi.co m” “ CNN” t13 “ anchor:my.look. ca” “ CNN.co m” t6 “ <html> …” t5 “ <html> …” t3 “ <html> …” • Key • Byte array • Serves as the primary key for the table • Indexed far fast lookup • Column Family • Has a name (string) • Contains one or more related columns • Column • Belongs to one column family • Included inside the row • familyName:columnNa me Column family named “Contents” Column family named “anchor” Column named “apache.com”
  • 18. Row key Time Stamp Column “ content s:” Column “ anchor:” “ com.apac he.ww w” t12 “ <html> …” t11 “ <html> …” t10 “ anchor:apache .com” “ APACH E” “ com.cnn.w ww” t15 “ anchor:cnnsi.co m” “ CNN” t13 “ anchor:my.look. ca” “ CNN.co m” t6 “ <html> …” t5 “ <html> …” t3 “ <html> …” • Version Number • Unique within each key • By default System’s timestamp • Data type is Long • Value (Cell) • Byte array Version number for each row value
  • 19. HBase Architecture Three Major Components • The HBaseMaster • One master • The HRegionServer • Many region servers • The HBase client
  • 20. HBase Components • Region • A subset of a table’s rows, like horizontal range partitioning • Automatically done • RegionServer (many slaves) • Manages data regions • Serves data for reads and writes (using a log) • Master • Responsible for coordinating the slaves • Assigns regions, detects failures • Admin functions
  • 22. ZooKeeper • HBase depends on ZooKeeper • By default HBase manages the ZooKeeper instance • E.g., starts and stops ZooKeeper • HMaster and HRegionServers register themselves with ZooKeeper
  • 23. Creating a Table HBaseAdmin admin= new HBaseAdmin(config); HColumnDescriptor []column; column= new HColumnDescriptor[2]; column[0]=new HColumnDescriptor("columnFamily1:"); column[1]=new HColumnDescriptor("columnFamily2:"); HTableDescriptor desc= new HTableDescriptor(Bytes.toBytes("MyTable")); desc.addFamily(column[0]); desc.addFamily(column[1]); admin.createTable(desc);
  • 24. Operations On Regions: Get() • Given a key  return corresponding record • For each value return the highest version ● Can control the number of versions you want
  • 25. Get() Select value from table where key=‘com.apache.www’ AND label=‘anchor:apache.com’ Row key Time Stamp Column “anchor:” “com.apache.www” t12 t11 t10 “anchor:apache.com” “APACHE” “com.cnn.www” t9 “anchor:cnnsi.com” “CNN” t8 “anchor:my.look.ca” “CNN.com” t6 t5 t3
  • 27. Scan() Select value from table where anchor=‘cnnsi.com’ Row key Time Stamp Column “anchor:” “com.apache.www” t12 t11 t10 “anchor:apache.com” “APACHE” “com.cnn.www” t9 “anchor:cnnsi.com” “CNN” t8 “anchor:my.look.ca” “CNN.com” t6 t5 t3
  • 28. Operations On Regions: Put() ● Insert a new record (with a new key), Or ● Insert a record for an existing key Implicit version number (timestamp) Explicit version number
  • 29. Operations On Regions: Delete() • Marking table cells as deleted • Multiple levels • Can mark an entire column family as deleted • Can make all column families of a given row as deleted • All operations are logged by the RegionServers • The log is flushed periodically
  • 30. Altering a Table Disable the table before changing the schema
  • 33. References ● Introduction to Hbase trac.nchc.org.tw/cloud/raw- attachment/wiki/.../hbase_intro.ppt ● web.cs.wpi.edu/~cs525/s13-MYE/lectures/5/HBase.pptx ● www-users.cselabs.umn.edu/classes/Spring.../Hadoop-HBase- Tutorial.ppt ● www.cs.kent.edu/~jin/Cloud12Spring/HbaseHivePig.pptx
  翻译: