SlideShare a Scribd company logo
HBase
1
HBase: Overview
• HBase is a distributed column-oriented
data store built on top of HDFS
• HBase is an Apache open source project
whose goal is to provide storage for the
Hadoop Distributed Computing
• Data is logically organized into tables,
rows and columns
2
HBase: Part of Hadoop’s Ecosystem
3
HBase is built on top of HDFS
HBase files are
internally
stored in HDFS
HBase vs. HDFS
• Both are distributed systems that scale to
hundreds or thousands of nodes
• HDFS is good for batch processing
(scans over big files)
– Not good for record lookup
– Not good for incremental addition of small
batches
– Not good for updates 4
HBase vs. HDFS (Cont’d)
• HBase is designed to efficiently address
the above points
– Fast record lookup
– Support for record-level insertion
– Support for updates (not in place)
• HBase updates are done by creating new
versions of values
5
HBase vs. HDFS (Cont’d)
6
If application has neither random reads or writes  Stick to HDFS
HBase Data Model
7
HBase Data Model
• HBase is based on Google’s
Bigtable model
– Key-Value pairs
8
Row key
Column Family
value
TimeStamp
HBase Logical View
9
HBase: Keys and Column Families
10
Each row has a Key
Each record is divided into Column Families
Each column family consists of one or more Columns
• Key
– Byte array
– Serves as the
primary key for
the table
– Indexed far fast
lookup
• Column Family
– Has a name
(string)
– Contains one or
more related
columns
• Column
– Belongs to one
column family
– Included inside
the row
• familyName:col
umnName
11
Row key
Time
Stamp
Column
“content
s:”
Column “anchor:”
“com.apac
he.ww
w”
t12
“<html>
…”
t11
“<html>
…”
t10
“anchor:apache
.com”
“APACH
E”
“com.cnn.w
ww”
t15
“anchor:cnnsi.co
m”
“CNN”
t13
“anchor:my.look.
ca”
“CNN.co
m”
t6
“<html>
…”
t5
“<html>
…”
t3
“<html>
…”
Column family named “Contents”
Column family named “anchor”
Column named “apache.com”
• Version Number
– Unique within each
key
– By default
System’s
timestamp
– Data type is Long
• Value (Cell)
– Byte array
12
Row key
Time
Stamp
Column
“content
s:”
Column “anchor:”
“com.apac
he.ww
w”
t12
“<html>
…”
t11
“<html>
…”
t10
“anchor:apache
.com”
“APACH
E”
“com.cnn.w
ww”
t15
“anchor:cnnsi.co
m”
“CNN”
t13
“anchor:my.look.
ca”
“CNN.co
m”
t6
“<html>
…”
t5
“<html>
…”
t3
“<html>
…”
Version number for each row
value
Notes on Data Model
• HBase schema consists of several Tables
• Each table consists of a set of Column
Families
– Columns are not part of the schema
• HBase has Dynamic Columns
– Because column names are encoded inside the
cells
– Different cells can have different columns
13
“Roles” column family
has different columns
in different cells
Notes on Data Model (Cont’d)
• The version number can be user-supplied
– Even does not have to be inserted in increasing order
– Version number are unique within each key
• Table can be very sparse
– Many cells are empty
• Keys are indexed as the primary key
Has two columns
[cnnsi.com & my.look.ca]
HBase Physical Model
15
HBase Physical Model
• Each column family is stored in a separate file
(called HTables)
• Key & Version numbers are replicated with each
column family
• Empty cells are not stored
16
HBase maintains a multi-
level index on values:
<key, column family,
column name, timestamp>
Example
17
Column Families
18
HBase Regions
• Each HTable (column family) is
partitioned horizontally into regions
– Regions are counterpart to HDFS blocks
19
Each will be one region
HBase Architecture
20
Three Major Components
21
• The HBaseMaster
– One master
• The HRegionServer
– Many region
servers
• The HBase client
HBase Components
• Region
– A subset of a table’s rows, like horizontal
range partitioning
– Automatically done
• RegionServer (many slaves)
– Manages data regions
– Serves data for reads and writes (using a
log)
• Master
– Responsible for coordinating the slaves
– Assigns regions, detects failures
– Admin functions 22
Big Picture
23
ZooKeeper
• HBase depends on
ZooKeeper
• By default HBase manages
the ZooKeeper instance
– E.g., starts and stops
ZooKeeper
• HMaster and HRegionServers
register themselves with
ZooKeeper
24
Creating a Table
HBaseAdmin admin= new
HBaseAdmin(config);
HColumnDescriptor []column;
column= new HColumnDescriptor[2];
column[0]=new
HColumnDescriptor("columnFamily1:");
column[1]=new
HColumnDescriptor("columnFamily2:");
HTableDescriptor desc= new
HTableDescriptor(Bytes.toBytes("MyTable"));
desc.addFamily(column[0]);
desc.addFamily(column[1]);
admin.createTable(desc);
25
Operations On Regions: Get()
• Given a key  return corresponding record
• For each value return the highest version
26
• Can control the number of versions you want
Operations On Regions: Scan()
27
Get()
Row key
Time
Stamp
Column “anchor:”
“com.apache.www”
t12
t11
t10 “anchor:apache.com” “APACHE”
“com.cnn.www”
t9 “anchor:cnnsi.com” “CNN”
t8 “anchor:my.look.ca” “CNN.com”
t6
t5
t3
Select value from table where
key=‘com.apache.www’
AND
label=‘anchor:apache.com’
Scan()
Select value from table
where
anchor=‘cnnsi.com’
Row key
Time
Stamp
Column “anchor:”
“com.apache.www”
t12
t11
t10 “anchor:apache.com” “APACHE”
“com.cnn.www”
t9 “anchor:cnnsi.com” “CNN”
t8 “anchor:my.look.ca” “CNN.com”
t6
t5
t3
Operations On Regions: Put()
• Insert a new record (with a new key), Or
• Insert a record for an existing key
30
Implicit version number
(timestamp)
Explicit version number
Operations On Regions: Delete()
• Marking table cells as deleted
• Multiple levels
– Can mark an entire column family as deleted
– Can make all column families of a given row as deleted
31
• All operations are logged by the
RegionServers
• The log is flushed periodically
HBase: Joins
• HBase does not support joins
• Can be done in the application layer
– Using scan() and get() operations
32
Altering a Table
33
Disable the table before changing the
schema
Logging Operations
34
HBase Deployment
35
Master
node
Slave
nodes
HBase vs. HDFS
36
HBase vs. RDBMS
37
When to use HBase
38
Thank you:)
39
Ad

More Related Content

Similar to HBASE, HIVE , ARCHITECTURE AND WORKING EXAMPLES (20)

Apache HBase Workshop
Apache HBase WorkshopApache HBase Workshop
Apache HBase Workshop
Valerii Moisieienko
 
CCS334 BIG DATA ANALYTICS UNIT 5 PPT ELECTIVE PAPER
CCS334 BIG DATA ANALYTICS UNIT 5 PPT  ELECTIVE PAPERCCS334 BIG DATA ANALYTICS UNIT 5 PPT  ELECTIVE PAPER
CCS334 BIG DATA ANALYTICS UNIT 5 PPT ELECTIVE PAPER
KrishnaVeni451953
 
Apache HBase™
Apache HBase™Apache HBase™
Apache HBase™
Prashant Gupta
 
Hbase
HbaseHbase
Hbase
Vetri V
 
Valerii Moisieienko Apache hbase workshop
Valerii Moisieienko	Apache hbase workshopValerii Moisieienko	Apache hbase workshop
Valerii Moisieienko Apache hbase workshop
Аліна Шепшелей
 
SE2016 Java Valerii Moisieienko "Apache HBase Workshop"
SE2016 Java Valerii Moisieienko "Apache HBase Workshop"SE2016 Java Valerii Moisieienko "Apache HBase Workshop"
SE2016 Java Valerii Moisieienko "Apache HBase Workshop"
Inhacking
 
Chicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionChicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An Introduction
Cloudera, Inc.
 
Apache Hadoop 1.1
Apache Hadoop 1.1Apache Hadoop 1.1
Apache Hadoop 1.1
Sperasoft
 
Hbase
HbaseHbase
Hbase
AllsoftSolutions
 
Hbase 20141003
Hbase 20141003Hbase 20141003
Hbase 20141003
Jean-Baptiste Poullet
 
BDA: Introduction to HIVE, PIG and HBASE
BDA: Introduction to HIVE, PIG and HBASEBDA: Introduction to HIVE, PIG and HBASE
BDA: Introduction to HIVE, PIG and HBASE
tripathineeharika
 
Introduction to HBase - Phoenix HUG 5/14
Introduction to HBase - Phoenix HUG 5/14Introduction to HBase - Phoenix HUG 5/14
Introduction to HBase - Phoenix HUG 5/14
Jeremy Walsh
 
03 hive query language (hql)
03 hive query language (hql)03 hive query language (hql)
03 hive query language (hql)
Subhas Kumar Ghosh
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
DataWorks Summit/Hadoop Summit
 
HBase in Practice
HBase in PracticeHBase in Practice
HBase in Practice
larsgeorge
 
H base introduction & development
H base introduction & developmentH base introduction & development
H base introduction & development
Shashwat Shriparv
 
Big data hbase
Big data hbase Big data hbase
Big data hbase
ANSHUL GUPTA
 
Intro to HBase - Lars George
Intro to HBase - Lars GeorgeIntro to HBase - Lars George
Intro to HBase - Lars George
JAX London
 
Introduction to Apache HBase
Introduction to Apache HBaseIntroduction to Apache HBase
Introduction to Apache HBase
Gokuldas Pillai
 
Hbase
HbaseHbase
Hbase
AmitkumarPal21
 
CCS334 BIG DATA ANALYTICS UNIT 5 PPT ELECTIVE PAPER
CCS334 BIG DATA ANALYTICS UNIT 5 PPT  ELECTIVE PAPERCCS334 BIG DATA ANALYTICS UNIT 5 PPT  ELECTIVE PAPER
CCS334 BIG DATA ANALYTICS UNIT 5 PPT ELECTIVE PAPER
KrishnaVeni451953
 
SE2016 Java Valerii Moisieienko "Apache HBase Workshop"
SE2016 Java Valerii Moisieienko "Apache HBase Workshop"SE2016 Java Valerii Moisieienko "Apache HBase Workshop"
SE2016 Java Valerii Moisieienko "Apache HBase Workshop"
Inhacking
 
Chicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionChicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An Introduction
Cloudera, Inc.
 
Apache Hadoop 1.1
Apache Hadoop 1.1Apache Hadoop 1.1
Apache Hadoop 1.1
Sperasoft
 
BDA: Introduction to HIVE, PIG and HBASE
BDA: Introduction to HIVE, PIG and HBASEBDA: Introduction to HIVE, PIG and HBASE
BDA: Introduction to HIVE, PIG and HBASE
tripathineeharika
 
Introduction to HBase - Phoenix HUG 5/14
Introduction to HBase - Phoenix HUG 5/14Introduction to HBase - Phoenix HUG 5/14
Introduction to HBase - Phoenix HUG 5/14
Jeremy Walsh
 
HBase in Practice
HBase in PracticeHBase in Practice
HBase in Practice
larsgeorge
 
H base introduction & development
H base introduction & developmentH base introduction & development
H base introduction & development
Shashwat Shriparv
 
Intro to HBase - Lars George
Intro to HBase - Lars GeorgeIntro to HBase - Lars George
Intro to HBase - Lars George
JAX London
 
Introduction to Apache HBase
Introduction to Apache HBaseIntroduction to Apache HBase
Introduction to Apache HBase
Gokuldas Pillai
 

Recently uploaded (20)

2024-Media-Literacy-Index-Of-Ukrainians-ENG-SHORT.pdf
2024-Media-Literacy-Index-Of-Ukrainians-ENG-SHORT.pdf2024-Media-Literacy-Index-Of-Ukrainians-ENG-SHORT.pdf
2024-Media-Literacy-Index-Of-Ukrainians-ENG-SHORT.pdf
OlhaTatokhina1
 
problem solving.presentation slideshow bsc nursing
problem solving.presentation slideshow bsc nursingproblem solving.presentation slideshow bsc nursing
problem solving.presentation slideshow bsc nursing
vishnudathas123
 
Transforming health care with ai powered
Transforming health care with ai poweredTransforming health care with ai powered
Transforming health care with ai powered
gowthamarvj
 
RAG Chatbot using AWS Bedrock and Streamlit Framework
RAG Chatbot using AWS Bedrock and Streamlit FrameworkRAG Chatbot using AWS Bedrock and Streamlit Framework
RAG Chatbot using AWS Bedrock and Streamlit Framework
apanneer
 
L1_Slides_Foundational Concepts_508.pptx
L1_Slides_Foundational Concepts_508.pptxL1_Slides_Foundational Concepts_508.pptx
L1_Slides_Foundational Concepts_508.pptx
38NoopurPatel
 
Process Mining Machine Recoveries to Reduce Downtime
Process Mining Machine Recoveries to Reduce DowntimeProcess Mining Machine Recoveries to Reduce Downtime
Process Mining Machine Recoveries to Reduce Downtime
Process mining Evangelist
 
real illuminati Uganda agent 0782561496/0756664682
real illuminati Uganda agent 0782561496/0756664682real illuminati Uganda agent 0782561496/0756664682
real illuminati Uganda agent 0782561496/0756664682
way to join real illuminati Agent In Kampala Call/WhatsApp+256782561496/0756664682
 
Understanding Complex Development Processes
Understanding Complex Development ProcessesUnderstanding Complex Development Processes
Understanding Complex Development Processes
Process mining Evangelist
 
Automated Melanoma Detection via Image Processing.pptx
Automated Melanoma Detection via Image Processing.pptxAutomated Melanoma Detection via Image Processing.pptx
Automated Melanoma Detection via Image Processing.pptx
handrymaharjan23
 
3. Univariable and Multivariable Analysis_Using Stata_2025.pdf
3. Univariable and Multivariable Analysis_Using Stata_2025.pdf3. Univariable and Multivariable Analysis_Using Stata_2025.pdf
3. Univariable and Multivariable Analysis_Using Stata_2025.pdf
axonneurologycenter1
 
Lagos School of Programming Final Project Updated.pdf
Lagos School of Programming Final Project Updated.pdfLagos School of Programming Final Project Updated.pdf
Lagos School of Programming Final Project Updated.pdf
benuju2016
 
4. Multivariable statistics_Using Stata_2025.pdf
4. Multivariable statistics_Using Stata_2025.pdf4. Multivariable statistics_Using Stata_2025.pdf
4. Multivariable statistics_Using Stata_2025.pdf
axonneurologycenter1
 
Agricultural_regionalisation_in_India(Final).pptx
Agricultural_regionalisation_in_India(Final).pptxAgricultural_regionalisation_in_India(Final).pptx
Agricultural_regionalisation_in_India(Final).pptx
mostafaahammed38
 
TOAE201-Slides-Chapter 4. Sample theoretical basis (1).pdf
TOAE201-Slides-Chapter 4. Sample theoretical basis (1).pdfTOAE201-Slides-Chapter 4. Sample theoretical basis (1).pdf
TOAE201-Slides-Chapter 4. Sample theoretical basis (1).pdf
NhiV747372
 
Oral Malodor.pptx jsjshdhushehsidjjeiejdhfj
Oral Malodor.pptx jsjshdhushehsidjjeiejdhfjOral Malodor.pptx jsjshdhushehsidjjeiejdhfj
Oral Malodor.pptx jsjshdhushehsidjjeiejdhfj
maitripatel5301
 
Automation Platforms and Process Mining - success story
Automation Platforms and Process Mining - success storyAutomation Platforms and Process Mining - success story
Automation Platforms and Process Mining - success story
Process mining Evangelist
 
AI ------------------------------ W1L2.pptx
AI ------------------------------ W1L2.pptxAI ------------------------------ W1L2.pptx
AI ------------------------------ W1L2.pptx
AyeshaJalil6
 
Microsoft Excel: A Comprehensive Overview
Microsoft Excel: A Comprehensive OverviewMicrosoft Excel: A Comprehensive Overview
Microsoft Excel: A Comprehensive Overview
GinaTomarongRegencia
 
Process Mining at Dimension Data - Jan vermeulen
Process Mining at Dimension Data - Jan vermeulenProcess Mining at Dimension Data - Jan vermeulen
Process Mining at Dimension Data - Jan vermeulen
Process mining Evangelist
 
Decision Trees in Artificial-Intelligence.pdf
Decision Trees in Artificial-Intelligence.pdfDecision Trees in Artificial-Intelligence.pdf
Decision Trees in Artificial-Intelligence.pdf
Saikat Basu
 
2024-Media-Literacy-Index-Of-Ukrainians-ENG-SHORT.pdf
2024-Media-Literacy-Index-Of-Ukrainians-ENG-SHORT.pdf2024-Media-Literacy-Index-Of-Ukrainians-ENG-SHORT.pdf
2024-Media-Literacy-Index-Of-Ukrainians-ENG-SHORT.pdf
OlhaTatokhina1
 
problem solving.presentation slideshow bsc nursing
problem solving.presentation slideshow bsc nursingproblem solving.presentation slideshow bsc nursing
problem solving.presentation slideshow bsc nursing
vishnudathas123
 
Transforming health care with ai powered
Transforming health care with ai poweredTransforming health care with ai powered
Transforming health care with ai powered
gowthamarvj
 
RAG Chatbot using AWS Bedrock and Streamlit Framework
RAG Chatbot using AWS Bedrock and Streamlit FrameworkRAG Chatbot using AWS Bedrock and Streamlit Framework
RAG Chatbot using AWS Bedrock and Streamlit Framework
apanneer
 
L1_Slides_Foundational Concepts_508.pptx
L1_Slides_Foundational Concepts_508.pptxL1_Slides_Foundational Concepts_508.pptx
L1_Slides_Foundational Concepts_508.pptx
38NoopurPatel
 
Process Mining Machine Recoveries to Reduce Downtime
Process Mining Machine Recoveries to Reduce DowntimeProcess Mining Machine Recoveries to Reduce Downtime
Process Mining Machine Recoveries to Reduce Downtime
Process mining Evangelist
 
Automated Melanoma Detection via Image Processing.pptx
Automated Melanoma Detection via Image Processing.pptxAutomated Melanoma Detection via Image Processing.pptx
Automated Melanoma Detection via Image Processing.pptx
handrymaharjan23
 
3. Univariable and Multivariable Analysis_Using Stata_2025.pdf
3. Univariable and Multivariable Analysis_Using Stata_2025.pdf3. Univariable and Multivariable Analysis_Using Stata_2025.pdf
3. Univariable and Multivariable Analysis_Using Stata_2025.pdf
axonneurologycenter1
 
Lagos School of Programming Final Project Updated.pdf
Lagos School of Programming Final Project Updated.pdfLagos School of Programming Final Project Updated.pdf
Lagos School of Programming Final Project Updated.pdf
benuju2016
 
4. Multivariable statistics_Using Stata_2025.pdf
4. Multivariable statistics_Using Stata_2025.pdf4. Multivariable statistics_Using Stata_2025.pdf
4. Multivariable statistics_Using Stata_2025.pdf
axonneurologycenter1
 
Agricultural_regionalisation_in_India(Final).pptx
Agricultural_regionalisation_in_India(Final).pptxAgricultural_regionalisation_in_India(Final).pptx
Agricultural_regionalisation_in_India(Final).pptx
mostafaahammed38
 
TOAE201-Slides-Chapter 4. Sample theoretical basis (1).pdf
TOAE201-Slides-Chapter 4. Sample theoretical basis (1).pdfTOAE201-Slides-Chapter 4. Sample theoretical basis (1).pdf
TOAE201-Slides-Chapter 4. Sample theoretical basis (1).pdf
NhiV747372
 
Oral Malodor.pptx jsjshdhushehsidjjeiejdhfj
Oral Malodor.pptx jsjshdhushehsidjjeiejdhfjOral Malodor.pptx jsjshdhushehsidjjeiejdhfj
Oral Malodor.pptx jsjshdhushehsidjjeiejdhfj
maitripatel5301
 
Automation Platforms and Process Mining - success story
Automation Platforms and Process Mining - success storyAutomation Platforms and Process Mining - success story
Automation Platforms and Process Mining - success story
Process mining Evangelist
 
AI ------------------------------ W1L2.pptx
AI ------------------------------ W1L2.pptxAI ------------------------------ W1L2.pptx
AI ------------------------------ W1L2.pptx
AyeshaJalil6
 
Microsoft Excel: A Comprehensive Overview
Microsoft Excel: A Comprehensive OverviewMicrosoft Excel: A Comprehensive Overview
Microsoft Excel: A Comprehensive Overview
GinaTomarongRegencia
 
Process Mining at Dimension Data - Jan vermeulen
Process Mining at Dimension Data - Jan vermeulenProcess Mining at Dimension Data - Jan vermeulen
Process Mining at Dimension Data - Jan vermeulen
Process mining Evangelist
 
Decision Trees in Artificial-Intelligence.pdf
Decision Trees in Artificial-Intelligence.pdfDecision Trees in Artificial-Intelligence.pdf
Decision Trees in Artificial-Intelligence.pdf
Saikat Basu
 
Ad

HBASE, HIVE , ARCHITECTURE AND WORKING EXAMPLES

  • 2. HBase: Overview • HBase is a distributed column-oriented data store built on top of HDFS • HBase is an Apache open source project whose goal is to provide storage for the Hadoop Distributed Computing • Data is logically organized into tables, rows and columns 2
  • 3. HBase: Part of Hadoop’s Ecosystem 3 HBase is built on top of HDFS HBase files are internally stored in HDFS
  • 4. HBase vs. HDFS • Both are distributed systems that scale to hundreds or thousands of nodes • HDFS is good for batch processing (scans over big files) – Not good for record lookup – Not good for incremental addition of small batches – Not good for updates 4
  • 5. HBase vs. HDFS (Cont’d) • HBase is designed to efficiently address the above points – Fast record lookup – Support for record-level insertion – Support for updates (not in place) • HBase updates are done by creating new versions of values 5
  • 6. HBase vs. HDFS (Cont’d) 6 If application has neither random reads or writes  Stick to HDFS
  • 8. HBase Data Model • HBase is based on Google’s Bigtable model – Key-Value pairs 8 Row key Column Family value TimeStamp
  • 10. HBase: Keys and Column Families 10 Each row has a Key Each record is divided into Column Families Each column family consists of one or more Columns
  • 11. • Key – Byte array – Serves as the primary key for the table – Indexed far fast lookup • Column Family – Has a name (string) – Contains one or more related columns • Column – Belongs to one column family – Included inside the row • familyName:col umnName 11 Row key Time Stamp Column “content s:” Column “anchor:” “com.apac he.ww w” t12 “<html> …” t11 “<html> …” t10 “anchor:apache .com” “APACH E” “com.cnn.w ww” t15 “anchor:cnnsi.co m” “CNN” t13 “anchor:my.look. ca” “CNN.co m” t6 “<html> …” t5 “<html> …” t3 “<html> …” Column family named “Contents” Column family named “anchor” Column named “apache.com”
  • 12. • Version Number – Unique within each key – By default System’s timestamp – Data type is Long • Value (Cell) – Byte array 12 Row key Time Stamp Column “content s:” Column “anchor:” “com.apac he.ww w” t12 “<html> …” t11 “<html> …” t10 “anchor:apache .com” “APACH E” “com.cnn.w ww” t15 “anchor:cnnsi.co m” “CNN” t13 “anchor:my.look. ca” “CNN.co m” t6 “<html> …” t5 “<html> …” t3 “<html> …” Version number for each row value
  • 13. Notes on Data Model • HBase schema consists of several Tables • Each table consists of a set of Column Families – Columns are not part of the schema • HBase has Dynamic Columns – Because column names are encoded inside the cells – Different cells can have different columns 13 “Roles” column family has different columns in different cells
  • 14. Notes on Data Model (Cont’d) • The version number can be user-supplied – Even does not have to be inserted in increasing order – Version number are unique within each key • Table can be very sparse – Many cells are empty • Keys are indexed as the primary key Has two columns [cnnsi.com & my.look.ca]
  • 16. HBase Physical Model • Each column family is stored in a separate file (called HTables) • Key & Version numbers are replicated with each column family • Empty cells are not stored 16 HBase maintains a multi- level index on values: <key, column family, column name, timestamp>
  • 19. HBase Regions • Each HTable (column family) is partitioned horizontally into regions – Regions are counterpart to HDFS blocks 19 Each will be one region
  • 21. Three Major Components 21 • The HBaseMaster – One master • The HRegionServer – Many region servers • The HBase client
  • 22. HBase Components • Region – A subset of a table’s rows, like horizontal range partitioning – Automatically done • RegionServer (many slaves) – Manages data regions – Serves data for reads and writes (using a log) • Master – Responsible for coordinating the slaves – Assigns regions, detects failures – Admin functions 22
  • 24. ZooKeeper • HBase depends on ZooKeeper • By default HBase manages the ZooKeeper instance – E.g., starts and stops ZooKeeper • HMaster and HRegionServers register themselves with ZooKeeper 24
  • 25. Creating a Table HBaseAdmin admin= new HBaseAdmin(config); HColumnDescriptor []column; column= new HColumnDescriptor[2]; column[0]=new HColumnDescriptor("columnFamily1:"); column[1]=new HColumnDescriptor("columnFamily2:"); HTableDescriptor desc= new HTableDescriptor(Bytes.toBytes("MyTable")); desc.addFamily(column[0]); desc.addFamily(column[1]); admin.createTable(desc); 25
  • 26. Operations On Regions: Get() • Given a key  return corresponding record • For each value return the highest version 26 • Can control the number of versions you want
  • 28. Get() Row key Time Stamp Column “anchor:” “com.apache.www” t12 t11 t10 “anchor:apache.com” “APACHE” “com.cnn.www” t9 “anchor:cnnsi.com” “CNN” t8 “anchor:my.look.ca” “CNN.com” t6 t5 t3 Select value from table where key=‘com.apache.www’ AND label=‘anchor:apache.com’
  • 29. Scan() Select value from table where anchor=‘cnnsi.com’ Row key Time Stamp Column “anchor:” “com.apache.www” t12 t11 t10 “anchor:apache.com” “APACHE” “com.cnn.www” t9 “anchor:cnnsi.com” “CNN” t8 “anchor:my.look.ca” “CNN.com” t6 t5 t3
  • 30. Operations On Regions: Put() • Insert a new record (with a new key), Or • Insert a record for an existing key 30 Implicit version number (timestamp) Explicit version number
  • 31. Operations On Regions: Delete() • Marking table cells as deleted • Multiple levels – Can mark an entire column family as deleted – Can make all column families of a given row as deleted 31 • All operations are logged by the RegionServers • The log is flushed periodically
  • 32. HBase: Joins • HBase does not support joins • Can be done in the application layer – Using scan() and get() operations 32
  • 33. Altering a Table 33 Disable the table before changing the schema
  • 38. When to use HBase 38
  翻译: