SlideShare a Scribd company logo
HBASE – THE SCALABLE
DATA STORE
An Introduction to HBase
JAX UK, October 2012

Lars George
Director EMEA Services
About Me

•  Director EMEA Services @ Cloudera
    •  Consulting on Hadoop projects (everywhere)
•  Apache Committer
    •  HBase and Whirr
•  O’Reilly Author
    •  HBase – The Definitive Guide
      •  Now in Japanese!

•  Contact
    •  lars@cloudera.com                      日本語版も出ました!	
  
    •  @larsgeorge
Agenda

•  Introduction to HBase
•  HBase Architecture
•  MapReduce with HBase
•  Advanced Techniques
•  Current Project Status
INTRODUCTION TO HBASE
Why Hadoop/HBase?

•  Datasets are constantly growing and intake soars
    •  Yahoo! has 140PB+ and 42k+ machines
    •  Facebook adds 500TB+ per day, 100PB+ raw data, on
       tens of thousands of machines
    •  Are you “throwing” data away today?
•  Traditional databases are expensive to scale and
   inherently difficult to distribute
•  Commodity hardware is cheap and powerful
   •  $1000 buys you 4-8 cores/4GB/1TB
   •  600GB 15k RPM SAS nearly $500
•  Need for random access and batch processing
    •  Hadoop only supports batch/streaming
History of Hadoop/HBase

•  Google solved its scalability problems
    •  “The Google File System” published October 2003
      •  Hadoop DFS
   •  “MapReduce: Simplified Data Processing on Large
     Clusters” published December 2004
      •  Hadoop MapReduce
   •  “BigTable: A Distributed Storage System for
     Structured Data” published November 2006
      •  HBase
Hadoop Introduction

•  Two main components
    •  Hadoop Distributed File System (HDFS)
       •  A scalable, fault-tolerant, high performance distributed file
         system capable of running on commodity hardware
   •  Hadoop MapReduce
       •  Software framework for distributed computation

•  Significant adoption
    •  Used in production in hundreds of organizations
    •  Primary contributors: Yahoo!, Facebook, Cloudera
HDFS: Hadoop Distributed File System

•  Reliably store petabytes of replicated data across
 thousands of nodes
   •  Data divided into 64MB blocks, each block replicated
     three times
•  Master/Slave architecture
    •  Master NameNode contains block locations
    •  Slave DataNode manages block on local file system
•  Built on commodity hardware
    •  No 15k RPM disks or RAID required (nor wanted!)
MapReduce

•  Distributed programming model to reliably
 process petabytes of data using its locality
   •  Built-in bindings for Java and C
   •  Can be used with any language via Hadoop
     Streaming
•  Inspired by map and reduce functions in
 functional programming

 Input	
  è	
  Map()	
  è	
  Copy/Sort	
  è	
  Reduce()	
  è	
  Output	
  
 	
  
Hadoop…

•  … is designed to store and stream extremely large
   datasets in batch
•  … is not intended for realtime querying
•  … does not support random access
•  … does not handle billions of small files well
   •  Less than default block size of 64MB and smaller
   •  Keeps “inodes” in memory on master
•  … is not supporting structured data more than
 unstructured or complex data

              That is why we have HBase!
Why HBase and not …?

•  Question: Why HBase and not <put-your-favorite-
   nosql-solution-here>?
•  What else is there?
   •    Key/value stores
   •    Document-oriented stores
   •    Column-oriented stores
   •    Graph-oriented stores
•  Features to ask for
    •  In memory or persistent?
    •  Strict or eventual consistency?
    •  Distributed or single machine (or afterthought)?
    •  Designed for read and/or write speeds?
    •  How does it scale? (if that is what you need)
What is HBase?

•  Distributed
•  Column-Oriented
•  Multi-Dimensional
•  High-Availability (CAP anyone?)
•  High-Performance
•  Storage System

                       Project Goals
   Billions of Rows * Millions of Columns * Thousands of
                            Versions
    Petabytes across thousands of commodity servers
HBase is not…

•  An SQL Database
    •  No joins, no query engine, no types, no SQL
    •  Transactions and secondary indexes only as add-ons but
       immature
•  A drop-in replacement for your RDBMS
•  You must be OK with RDBMS anti-schema
    •  Denormalized data
    •  Wide and sparsely populated tables
    •  Just say “no” to your inner DBA


               Keyword: Impedance Match
HBase Tables
HBase Tables
HBase Tables
HBase Tables
HBase Tables
HBase Tables
HBase Tables
HBase Tables
HBase Tables
HBase Tables
HBase Tables

•  Tables are sorted by the Row Key in
   lexicographical order
•  Table schema only defines its Column Families
  •  Each family consists of any number of Columns
  •  Each column consists of any number of Versions
  •  Columns only exist when inserted, NULLs are free
  •  Columns within a family are sorted and stored
     together
  •  Everything except table names are byte[]


(Table, Row, Family:Column, Timestamp) è Value
Column Family vs. Column

•  Use only a few column families
    •  Causes many files that need to stay open per region
       plus class overhead per family
•  Best used when logical separation between data
   and meta columns
•  Sorting per family can be used to convey
   application logic or access pattern
HBase Architecture

•  Table is made up of any number if regions
•  Region is specified by its startKey and endKey
    •  Empty table: (Table, NULL, NULL)
    •  Two-region table: (Table, NULL, “com.cloudera.www”)
       and (Table, “com.cloudera.www”, NULL)
•  Each region may live on a different node and is
 made up of several HDFS files and blocks, each
 of which is replicated by Hadoop
HBase Architecture (cont.)

•  Two types of HBase nodes:
        Master and RegionServer
•  Special tables -ROOT- and.META. store schema
   information and region locations
•  Master server responsible for RegionServer
   monitoring as well as assignment and load
   balancing of regions
•  Uses ZooKeeper as its distributed coordination
   service
  •  Manages Master election and server availability
Web Crawl Example

•  Canonical use-case for BigTable
•  Store web crawl data
    •  Table webtable with family content and meta
    •  Row is reversed URL with Columns
      •  content:data stores the raw crawled data
      •  meta:language stores http language header
      •  meta:type stores http content-type header
   •  While processing raw data for hyperlinks and images,
     add families links and images
      •  links:<rurl> column for each hyperlink
      •  images:<rurl> column for each image
HBase Clients

•  Native Java Client/API
•  Non-Java Clients
    •  REST server
    •  Avro server
    •  Thrift server
    •  Jython, Scala, Groovy DSL
•  TableInputFormat/TableOutputFormat for
 MapReduce
   •  HBase as MapReduce source and/or target
•  HBase Shell
    •  JRuby shell adding get, put, scan and admin calls
Java API

•  CRUD
    •  get: retrieve an entire, or partial row (R)
    •  put: create and update a row (CU)
    •  delete: delete a cell, column, columns, or row (D)


      Result get(Get get) throws IOException;

      void put(Put put) throws IOException;

      void delete(Delete delete) throws IOException;
Java API (cont.)

•  CRUD+SI
    •  scan:      Scan any number of rows (S)
    •  increment: Increment a column value (I)




ResultScanner getScanner(Scan scan) throws IOException;

Result increment(Increment increment) throws IOException ;
Java API (cont.)

•  CRUD+SI+CAS
    •  Atomic compare-and-swap (CAS)


•  Combined get, check, and put operation
•  Helps to overcome lack of full transactions
Batch Operations

•  Support Get, Put, and Delete
•  Reduce network round-trips
•  If possible, batch operation to the server to gain
 better overall throughput

    void batch(List<Row> actions, Object[] results)
      throws IOException, InterruptedException;

    Object[] batch(List<Row> actions)
      throws IOException, InterruptedException;
Filters

•  Can be used with Get and Scan operations
•  Server side hinting
•  Reduce data transferred to client
•  Filters are no guarantee for fast scans
    •  Still full table scan in worst-case scenario
    •  Might have to implement your own
•  Filters can hint next row key
HBase Extensions

•  Hive, Pig, Cascading
    •  Hadoop-targeted MapReduce tools with HBase
       integration
•  Sqoop
    •  Read and write to HBase for further processing in
       Hadoop
•  HBase Explorer, Nutch, Heretrix
•  SpringData
•  Toad
History of HBase
•  November 2006
     •  Google releases paper on BigTable
•  February 2007
     •  Initial HBase prototype created as Hadoop contrib
•  October 2007
     •  First “useable” HBase (Hadoop 0.15.0)
•  January 2008
     •  Hadoop becomes TLP, HBase becomes subproject
•  October 2008
     •  HBase 0.18.1 released
•  January 2009
     •  HBase 0.19.0
•  September 2009
     •  HBase 0.20.0 released (Performance Release)
•  May 2010
     •  HBase becomes TLP
•  June 2010
     •  HBase 0.89.20100621, first developer release
•  May 2011
     •  HBase 0.90.3 release
HBase Users

•  Adobe
•  eBay
•  Facebook
•  Mozilla (Socorro)
•  Trend Micro (Advanced Threat Research)
•  Twitter
•  Yahoo!
•  …
HBASE ARCHITECTURE
HBase Architecture
HBase Architecture (cont.)
HBase Architecture (cont.)

•  Based on Log-Structured Merge-Trees (LSM-Trees)
•  Inserts are done in write-ahead log first
•  Data is stored in memory and flushed to disk on
   regular intervals or based on size
•  Small flushes are merged in the background to keep
   number of files small
•  Reads read memory stores first and then disk based
   files second
•  Deletes are handled with “tombstone” markers
•  Atomicity on row level no matter how many columns
   •  keeps locking model easy
Write Ahead Log
MAPREDUCE WITH HBASE
MapReduce with HBase

•  Framework to use HBase as source and/or sink for
   MapReduce jobs
•  Thin layer over native Java API
•  Provides helper class to set up jobs easier

   TableMapReduceUtil.initTableMapperJob(
      “test”, scan, MyMapper.class,
      ImmutableBytesWritable.class,
      RowResult.class, job);


   TableMapReduceUtil.initTableReducerJob(
      “table”, MyReducer.class, job);
MapReduce with HBase (cont.)

•  Special use-case in regards to Hadoop
•  Tables are sorted and have unique keys
    •  Often we do not need a Reducer phase
    •  Combiner not needed
•  Need to make sure load is distributed properly by
   randomizing keys (or use bulk import)
•  Partial or full table scans possible
•  Scans are very efficient as they make use of block
   caches
   •  But then make sure you do not create to much churn, or
     better switch caching off when doing full table scans.
•  Can use filters to limit rows being processed
TableInputFormat

•  Transforms a HBase table into a source for
   MapReduce jobs
•  Internally uses a TableRecordReader which
   wraps a Scan instance
   •  Supports restarts to handle temporary issues
•  Splits table by region boundaries and stores
 current region locality
TableOutputFormat

•  Allows to use HBase table as output target
•  Put and Delete support from mapper or reducer
   class
•  Uses TableOutputCommitter to write data
•  Disables auto-commit on table to make use of
   client side write buffer
•  Handles final flush in close()
HFileOutputFormat

•  Used to bulk load data into HBase
•  Bypasses normal API and generates low-level
   store files
•  Prepares files for final bulk insert
•  Needs special handling of sort order and
   partitioning
•  Only supports one column family (for now)
•  Can load bulk updates into existing tables
MapReduce Helper

•  TableMapReduceUtil
•  IdentityTableMapper
     •  Passes on key and value, where value is a Result
        instance and key is set to value.getRow()
•  IdentityTableReducer
     •  Stores values into HBase, must be Put or Delete
        instances
•  HRegionPartitioner
    •  Not set by default, use it to control partioning on
       Hadoop level
Custom MapReduce over Tables

•  No requirement to use provided framework
•  Can read from or write to one or many tables in
   mapper and reducer
•  Can split not on regions but arbitrary boundaries
•  Make sure to use write buffer in OutputFormat to
   get best performance (do not forget to call
   flushCommits() at the end!)
ADVANCED TECHNIQUES
Advanced Techniques

•  Key/Table Design
•  DDI
•  Salting
•  Hashing vs. Sequential Keys
•  ColumnFamily vs. Column
•  Using BloomFilter
•  Data Locality
•  checkAndPut() and checkAndDelete()
•  Coprocessors
Coprocessors

•  New addition to feature set
•  Based on talk by Jeff Dean at LADIS 2009
    •  Run arbitrary code on each region in RegionServer
    •  High level call interface for clients
       •  Calls are addressed to rows or ranges of rows while
          Coprocessors client library resolves locations
       •  Calls to multiple rows are atomically split
   •  Provides model for distributed services
       •  Automatic scaling, load balancing, request routing
Coprocessors in HBase

•  Use for efficient computational parallelism
•  Secondary indexing (HBASE-2038)
•  Column Aggregates (HBASE-1512)
    •  SQL-like sum(), avg(), max(), min(), etc.
•  Access control (HBASE-3025, HBASE-3045)
    •  Provide basic access control
•  Table Metacolumns
•  New filtering
    •  predicate pushdown
•  Table/Region access statistics
•  HLog extensions (HBASE-3257)
Coprocessor and RegionObserver

•  The Coprocessor interface defines these hooks
    •  preOpen, postOpen: Called before and after the
       region is reported as online to the master
    •  preFlush, postFlush: Called before and after the
       memstore is flushed into a new store file
    •  preCompact, postCompact: Called before and after
       compaction
    •  preSplit, postSplit: Called after the region is split
    •  preClose, postClose: Called before and after the
       region is reported as closed to the master
Coprocessor and RegionObserver

•  The RegionObserver interface is defines these hooks:
    •  preGet, postGet: Called before and after a client makes a Get
       request
    •  preExists, postExists: Called before and after the client tests for
       existence using a Get
    •  prePut, postPut: Called before and after the client stores a value
    •  preDelete, postDelete: Called before and after the client deletes a
       value
    •  preScannerOpen, postScannerOpen: Called before and after the
       client opens a new scanner
    •  preScannerNext, postScannerNext: Called before and after the
       client asks for the next row on a scanner
    •  preScannerClose, postScannerClose: Called before and after the
       client closes a scanner
    •  preCheckAndPut, postCheckAndPut: Called before and after the
       client calls checkAndPut()
    •  preCheckAndDelete, postCheckAndDelete: Called before and after
       the client calls checkAndDelete()
PROJECT STATUS
Current Project Status

•  HBase 0.90.x “Advanced Concepts”
    •  Master Rewrite – More Zookeeper
    •  Intra Row Scanning
    •  Further optimizations on algorithms and data
       structures
           CDH3
•  HBase 0.92.x “Coprocessors”
    •  Multi-DC Replication
    •  Discretionary Access Control
    •  Coprocessors
           CDH4
Current Project Status (cont.)

•  HBase 0.94.x “Performance Release”
    •  Read CRC Improvements
    •  Seek Optimizations
    •  WAL Compression
    •  Prefix Compression (aka Block Encoding)
    •  Atomic Append
    •  Atomic put+delete
    •  Multi Increment and Multi Append
    •  Per-region (i.e. local) Multi-Row Transactions
    •  WALPlayer

         CDH4.x    (soon)
Current Project Status (cont.)

•  HBase 0.96.x “The Singularity”
    •  Protobuf RPC
      •  Rolling Upgrades
      •  Multiversion Access
  •  Metrics V2
  •  Preview Technologies
      •  Snapshots
      •  PrefixTrie Block Encoding



        CDH5 ?
Ques%ons?	
  
Ad

More Related Content

What's hot (20)

Apache HBase™
Apache HBase™Apache HBase™
Apache HBase™
Prashant Gupta
 
Chicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionChicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An Introduction
Cloudera, Inc.
 
Hw09 Practical HBase Getting The Most From Your H Base Install
Hw09   Practical HBase  Getting The Most From Your H Base InstallHw09   Practical HBase  Getting The Most From Your H Base Install
Hw09 Practical HBase Getting The Most From Your H Base Install
Cloudera, Inc.
 
Intro to HBase Internals & Schema Design (for HBase users)
Intro to HBase Internals & Schema Design (for HBase users)Intro to HBase Internals & Schema Design (for HBase users)
Intro to HBase Internals & Schema Design (for HBase users)
alexbaranau
 
Intro to HBase
Intro to HBaseIntro to HBase
Intro to HBase
alexbaranau
 
Hbase: an introduction
Hbase: an introductionHbase: an introduction
Hbase: an introduction
Jean-Baptiste Poullet
 
Apache HBase 1.0 Release
Apache HBase 1.0 ReleaseApache HBase 1.0 Release
Apache HBase 1.0 Release
Nick Dimiduk
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
DataWorks Summit/Hadoop Summit
 
HBaseCon 2013: Compaction Improvements in Apache HBase
HBaseCon 2013: Compaction Improvements in Apache HBaseHBaseCon 2013: Compaction Improvements in Apache HBase
HBaseCon 2013: Compaction Improvements in Apache HBase
Cloudera, Inc.
 
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon
 
HBase Status Report - Hadoop Summit Europe 2014
HBase Status Report - Hadoop Summit Europe 2014HBase Status Report - Hadoop Summit Europe 2014
HBase Status Report - Hadoop Summit Europe 2014
larsgeorge
 
HBaseCon 2015- HBase @ Flipboard
HBaseCon 2015- HBase @ FlipboardHBaseCon 2015- HBase @ Flipboard
HBaseCon 2015- HBase @ Flipboard
Matthew Blair
 
HBase: Just the Basics
HBase: Just the BasicsHBase: Just the Basics
HBase: Just the Basics
HBaseCon
 
HBase in Practice
HBase in PracticeHBase in Practice
HBase in Practice
larsgeorge
 
Apache HBase - Just the Basics
Apache HBase - Just the BasicsApache HBase - Just the Basics
Apache HBase - Just the Basics
HBaseCon
 
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
Cloudera, Inc.
 
HBase for Architects
HBase for ArchitectsHBase for Architects
HBase for Architects
Nick Dimiduk
 
Introduction To HBase
Introduction To HBaseIntroduction To HBase
Introduction To HBase
Anil Gupta
 
HBase Read High Availability Using Timeline-Consistent Region Replicas
HBase Read High Availability Using Timeline-Consistent Region ReplicasHBase Read High Availability Using Timeline-Consistent Region Replicas
HBase Read High Availability Using Timeline-Consistent Region Replicas
HBaseCon
 
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBaseHBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
Cloudera, Inc.
 
Chicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionChicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An Introduction
Cloudera, Inc.
 
Hw09 Practical HBase Getting The Most From Your H Base Install
Hw09   Practical HBase  Getting The Most From Your H Base InstallHw09   Practical HBase  Getting The Most From Your H Base Install
Hw09 Practical HBase Getting The Most From Your H Base Install
Cloudera, Inc.
 
Intro to HBase Internals & Schema Design (for HBase users)
Intro to HBase Internals & Schema Design (for HBase users)Intro to HBase Internals & Schema Design (for HBase users)
Intro to HBase Internals & Schema Design (for HBase users)
alexbaranau
 
Apache HBase 1.0 Release
Apache HBase 1.0 ReleaseApache HBase 1.0 Release
Apache HBase 1.0 Release
Nick Dimiduk
 
HBaseCon 2013: Compaction Improvements in Apache HBase
HBaseCon 2013: Compaction Improvements in Apache HBaseHBaseCon 2013: Compaction Improvements in Apache HBase
HBaseCon 2013: Compaction Improvements in Apache HBase
Cloudera, Inc.
 
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon
 
HBase Status Report - Hadoop Summit Europe 2014
HBase Status Report - Hadoop Summit Europe 2014HBase Status Report - Hadoop Summit Europe 2014
HBase Status Report - Hadoop Summit Europe 2014
larsgeorge
 
HBaseCon 2015- HBase @ Flipboard
HBaseCon 2015- HBase @ FlipboardHBaseCon 2015- HBase @ Flipboard
HBaseCon 2015- HBase @ Flipboard
Matthew Blair
 
HBase: Just the Basics
HBase: Just the BasicsHBase: Just the Basics
HBase: Just the Basics
HBaseCon
 
HBase in Practice
HBase in PracticeHBase in Practice
HBase in Practice
larsgeorge
 
Apache HBase - Just the Basics
Apache HBase - Just the BasicsApache HBase - Just the Basics
Apache HBase - Just the Basics
HBaseCon
 
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
Cloudera, Inc.
 
HBase for Architects
HBase for ArchitectsHBase for Architects
HBase for Architects
Nick Dimiduk
 
Introduction To HBase
Introduction To HBaseIntroduction To HBase
Introduction To HBase
Anil Gupta
 
HBase Read High Availability Using Timeline-Consistent Region Replicas
HBase Read High Availability Using Timeline-Consistent Region ReplicasHBase Read High Availability Using Timeline-Consistent Region Replicas
HBase Read High Availability Using Timeline-Consistent Region Replicas
HBaseCon
 
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBaseHBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
Cloudera, Inc.
 

Viewers also liked (20)

Hortonworks Technical Workshop: HBase and Apache Phoenix
Hortonworks Technical Workshop: HBase and Apache Phoenix Hortonworks Technical Workshop: HBase and Apache Phoenix
Hortonworks Technical Workshop: HBase and Apache Phoenix
Hortonworks
 
Introduction to HBase
Introduction to HBaseIntroduction to HBase
Introduction to HBase
Avkash Chauhan
 
Strata + Hadoop World 2012: Apache HBase Features for the Enterprise
Strata + Hadoop World 2012: Apache HBase Features for the EnterpriseStrata + Hadoop World 2012: Apache HBase Features for the Enterprise
Strata + Hadoop World 2012: Apache HBase Features for the Enterprise
Cloudera, Inc.
 
Introduction to HBase - Phoenix HUG 5/14
Introduction to HBase - Phoenix HUG 5/14Introduction to HBase - Phoenix HUG 5/14
Introduction to HBase - Phoenix HUG 5/14
Jeremy Walsh
 
HBase from the Trenches - Phoenix Data Conference 2015
HBase from the Trenches - Phoenix Data Conference 2015HBase from the Trenches - Phoenix Data Conference 2015
HBase from the Trenches - Phoenix Data Conference 2015
Avinash Ramineni
 
HBaseConEast2016: HBase and Spark, State of the Art
HBaseConEast2016: HBase and Spark, State of the ArtHBaseConEast2016: HBase and Spark, State of the Art
HBaseConEast2016: HBase and Spark, State of the Art
Michael Stack
 
HBase Operations and Best Practices
HBase Operations and Best PracticesHBase Operations and Best Practices
HBase Operations and Best Practices
Venu Anuganti
 
HBase Client APIs (for webapps?)
HBase Client APIs (for webapps?)HBase Client APIs (for webapps?)
HBase Client APIs (for webapps?)
Nick Dimiduk
 
HBaseConEast2016: Practical Kerberos with Apache HBase
HBaseConEast2016: Practical Kerberos with Apache HBaseHBaseConEast2016: Practical Kerberos with Apache HBase
HBaseConEast2016: Practical Kerberos with Apache HBase
Michael Stack
 
Apache HBase Internals you hoped you Never Needed to Understand
Apache HBase Internals you hoped you Never Needed to UnderstandApache HBase Internals you hoped you Never Needed to Understand
Apache HBase Internals you hoped you Never Needed to Understand
Josh Elser
 
Kudu: New Hadoop Storage for Fast Analytics on Fast Data
Kudu: New Hadoop Storage for Fast Analytics on Fast DataKudu: New Hadoop Storage for Fast Analytics on Fast Data
Kudu: New Hadoop Storage for Fast Analytics on Fast Data
Cloudera, Inc.
 
Apache Phoenix + Apache HBase
Apache Phoenix + Apache HBaseApache Phoenix + Apache HBase
Apache Phoenix + Apache HBase
DataWorks Summit/Hadoop Summit
 
HBaseCon 2012 | Learning HBase Internals - Lars Hofhansl, Salesforce
HBaseCon 2012 | Learning HBase Internals - Lars Hofhansl, SalesforceHBaseCon 2012 | Learning HBase Internals - Lars Hofhansl, Salesforce
HBaseCon 2012 | Learning HBase Internals - Lars Hofhansl, Salesforce
Cloudera, Inc.
 
Apache HBase Low Latency
Apache HBase Low LatencyApache HBase Low Latency
Apache HBase Low Latency
Nick Dimiduk
 
Spark + HBase
Spark + HBase Spark + HBase
Spark + HBase
DataWorks Summit/Hadoop Summit
 
HBase Storage Internals
HBase Storage InternalsHBase Storage Internals
HBase Storage Internals
DataWorks Summit
 
Hbase: Introduction to column oriented databases
Hbase: Introduction to column oriented databasesHbase: Introduction to column oriented databases
Hbase: Introduction to column oriented databases
Luis Cipriani
 
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, ClouderaHBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
Cloudera, Inc.
 
Meet HBase 1.0
Meet HBase 1.0Meet HBase 1.0
Meet HBase 1.0
enissoz
 
Apache HBase Performance Tuning
Apache HBase Performance TuningApache HBase Performance Tuning
Apache HBase Performance Tuning
Lars Hofhansl
 
Hortonworks Technical Workshop: HBase and Apache Phoenix
Hortonworks Technical Workshop: HBase and Apache Phoenix Hortonworks Technical Workshop: HBase and Apache Phoenix
Hortonworks Technical Workshop: HBase and Apache Phoenix
Hortonworks
 
Strata + Hadoop World 2012: Apache HBase Features for the Enterprise
Strata + Hadoop World 2012: Apache HBase Features for the EnterpriseStrata + Hadoop World 2012: Apache HBase Features for the Enterprise
Strata + Hadoop World 2012: Apache HBase Features for the Enterprise
Cloudera, Inc.
 
Introduction to HBase - Phoenix HUG 5/14
Introduction to HBase - Phoenix HUG 5/14Introduction to HBase - Phoenix HUG 5/14
Introduction to HBase - Phoenix HUG 5/14
Jeremy Walsh
 
HBase from the Trenches - Phoenix Data Conference 2015
HBase from the Trenches - Phoenix Data Conference 2015HBase from the Trenches - Phoenix Data Conference 2015
HBase from the Trenches - Phoenix Data Conference 2015
Avinash Ramineni
 
HBaseConEast2016: HBase and Spark, State of the Art
HBaseConEast2016: HBase and Spark, State of the ArtHBaseConEast2016: HBase and Spark, State of the Art
HBaseConEast2016: HBase and Spark, State of the Art
Michael Stack
 
HBase Operations and Best Practices
HBase Operations and Best PracticesHBase Operations and Best Practices
HBase Operations and Best Practices
Venu Anuganti
 
HBase Client APIs (for webapps?)
HBase Client APIs (for webapps?)HBase Client APIs (for webapps?)
HBase Client APIs (for webapps?)
Nick Dimiduk
 
HBaseConEast2016: Practical Kerberos with Apache HBase
HBaseConEast2016: Practical Kerberos with Apache HBaseHBaseConEast2016: Practical Kerberos with Apache HBase
HBaseConEast2016: Practical Kerberos with Apache HBase
Michael Stack
 
Apache HBase Internals you hoped you Never Needed to Understand
Apache HBase Internals you hoped you Never Needed to UnderstandApache HBase Internals you hoped you Never Needed to Understand
Apache HBase Internals you hoped you Never Needed to Understand
Josh Elser
 
Kudu: New Hadoop Storage for Fast Analytics on Fast Data
Kudu: New Hadoop Storage for Fast Analytics on Fast DataKudu: New Hadoop Storage for Fast Analytics on Fast Data
Kudu: New Hadoop Storage for Fast Analytics on Fast Data
Cloudera, Inc.
 
HBaseCon 2012 | Learning HBase Internals - Lars Hofhansl, Salesforce
HBaseCon 2012 | Learning HBase Internals - Lars Hofhansl, SalesforceHBaseCon 2012 | Learning HBase Internals - Lars Hofhansl, Salesforce
HBaseCon 2012 | Learning HBase Internals - Lars Hofhansl, Salesforce
Cloudera, Inc.
 
Apache HBase Low Latency
Apache HBase Low LatencyApache HBase Low Latency
Apache HBase Low Latency
Nick Dimiduk
 
Hbase: Introduction to column oriented databases
Hbase: Introduction to column oriented databasesHbase: Introduction to column oriented databases
Hbase: Introduction to column oriented databases
Luis Cipriani
 
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, ClouderaHBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
Cloudera, Inc.
 
Meet HBase 1.0
Meet HBase 1.0Meet HBase 1.0
Meet HBase 1.0
enissoz
 
Apache HBase Performance Tuning
Apache HBase Performance TuningApache HBase Performance Tuning
Apache HBase Performance Tuning
Lars Hofhansl
 
Ad

Similar to Intro to HBase - Lars George (20)

Nyc hadoop meetup introduction to h base
Nyc hadoop meetup   introduction to h baseNyc hadoop meetup   introduction to h base
Nyc hadoop meetup introduction to h base
智杰 付
 
Hive - A theoretical overview in Detail.pptx
Hive - A theoretical overview in Detail.pptxHive - A theoretical overview in Detail.pptx
Hive - A theoretical overview in Detail.pptx
Mithun DSouza
 
HBase
HBaseHBase
HBase
Pooja Sunkapur
 
Apache Hive
Apache HiveApache Hive
Apache Hive
Amit Khandelwal
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Esther Kundin
 
NoSql - mayank singh
NoSql - mayank singhNoSql - mayank singh
NoSql - mayank singh
Mayank Singh
 
Introduction to Apache HBase
Introduction to Apache HBaseIntroduction to Apache HBase
Introduction to Apache HBase
Gokuldas Pillai
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Esther Kundin
 
Big Data Developers Moscow Meetup 1 - sql on hadoop
Big Data Developers Moscow Meetup 1  - sql on hadoopBig Data Developers Moscow Meetup 1  - sql on hadoop
Big Data Developers Moscow Meetup 1 - sql on hadoop
bddmoscow
 
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
Andrew Brust
 
Conhecendo o Apache HBase
Conhecendo o Apache HBaseConhecendo o Apache HBase
Conhecendo o Apache HBase
Felipe Ferreira
 
HBase and Hadoop at Urban Airship
HBase and Hadoop at Urban AirshipHBase and Hadoop at Urban Airship
HBase and Hadoop at Urban Airship
dave_revell
 
SQL on Hadoop
SQL on HadoopSQL on Hadoop
SQL on Hadoop
Bigdatapump
 
Microsoft's Big Play for Big Data
Microsoft's Big Play for Big DataMicrosoft's Big Play for Big Data
Microsoft's Big Play for Big Data
Andrew Brust
 
Unit II Hadoop Ecosystem_Updated.pptx
Unit II Hadoop Ecosystem_Updated.pptxUnit II Hadoop Ecosystem_Updated.pptx
Unit II Hadoop Ecosystem_Updated.pptx
BhavanaHotchandani
 
Hadoop Training in Hyderabad
Hadoop Training in HyderabadHadoop Training in Hyderabad
Hadoop Training in Hyderabad
CHENNAKESHAVAKATAGAR
 
Hadoop Training in Hyderabad
Hadoop Training in HyderabadHadoop Training in Hyderabad
Hadoop Training in Hyderabad
Rajitha D
 
Hive ppt on the basis of importance of big data
Hive ppt on the basis of importance of big dataHive ppt on the basis of importance of big data
Hive ppt on the basis of importance of big data
computer87914
 
hive_slides_Webinar_Session_1.pptx
hive_slides_Webinar_Session_1.pptxhive_slides_Webinar_Session_1.pptx
hive_slides_Webinar_Session_1.pptx
vishwasgarade1
 
Techincal Talk Hbase-Ditributed,no-sql database
Techincal Talk Hbase-Ditributed,no-sql databaseTechincal Talk Hbase-Ditributed,no-sql database
Techincal Talk Hbase-Ditributed,no-sql database
Rishabh Dugar
 
Nyc hadoop meetup introduction to h base
Nyc hadoop meetup   introduction to h baseNyc hadoop meetup   introduction to h base
Nyc hadoop meetup introduction to h base
智杰 付
 
Hive - A theoretical overview in Detail.pptx
Hive - A theoretical overview in Detail.pptxHive - A theoretical overview in Detail.pptx
Hive - A theoretical overview in Detail.pptx
Mithun DSouza
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Esther Kundin
 
NoSql - mayank singh
NoSql - mayank singhNoSql - mayank singh
NoSql - mayank singh
Mayank Singh
 
Introduction to Apache HBase
Introduction to Apache HBaseIntroduction to Apache HBase
Introduction to Apache HBase
Gokuldas Pillai
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Esther Kundin
 
Big Data Developers Moscow Meetup 1 - sql on hadoop
Big Data Developers Moscow Meetup 1  - sql on hadoopBig Data Developers Moscow Meetup 1  - sql on hadoop
Big Data Developers Moscow Meetup 1 - sql on hadoop
bddmoscow
 
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
Andrew Brust
 
Conhecendo o Apache HBase
Conhecendo o Apache HBaseConhecendo o Apache HBase
Conhecendo o Apache HBase
Felipe Ferreira
 
HBase and Hadoop at Urban Airship
HBase and Hadoop at Urban AirshipHBase and Hadoop at Urban Airship
HBase and Hadoop at Urban Airship
dave_revell
 
Microsoft's Big Play for Big Data
Microsoft's Big Play for Big DataMicrosoft's Big Play for Big Data
Microsoft's Big Play for Big Data
Andrew Brust
 
Unit II Hadoop Ecosystem_Updated.pptx
Unit II Hadoop Ecosystem_Updated.pptxUnit II Hadoop Ecosystem_Updated.pptx
Unit II Hadoop Ecosystem_Updated.pptx
BhavanaHotchandani
 
Hadoop Training in Hyderabad
Hadoop Training in HyderabadHadoop Training in Hyderabad
Hadoop Training in Hyderabad
Rajitha D
 
Hive ppt on the basis of importance of big data
Hive ppt on the basis of importance of big dataHive ppt on the basis of importance of big data
Hive ppt on the basis of importance of big data
computer87914
 
hive_slides_Webinar_Session_1.pptx
hive_slides_Webinar_Session_1.pptxhive_slides_Webinar_Session_1.pptx
hive_slides_Webinar_Session_1.pptx
vishwasgarade1
 
Techincal Talk Hbase-Ditributed,no-sql database
Techincal Talk Hbase-Ditributed,no-sql databaseTechincal Talk Hbase-Ditributed,no-sql database
Techincal Talk Hbase-Ditributed,no-sql database
Rishabh Dugar
 
Ad

More from JAX London (20)

Everything I know about software in spaghetti bolognese: managing complexity
Everything I know about software in spaghetti bolognese: managing complexityEverything I know about software in spaghetti bolognese: managing complexity
Everything I know about software in spaghetti bolognese: managing complexity
JAX London
 
Devops with the S for Sharing - Patrick Debois
Devops with the S for Sharing - Patrick DeboisDevops with the S for Sharing - Patrick Debois
Devops with the S for Sharing - Patrick Debois
JAX London
 
Busy Developer's Guide to Windows 8 HTML/JavaScript Apps
Busy Developer's Guide to Windows 8 HTML/JavaScript AppsBusy Developer's Guide to Windows 8 HTML/JavaScript Apps
Busy Developer's Guide to Windows 8 HTML/JavaScript Apps
JAX London
 
It's code but not as we know: Infrastructure as Code - Patrick Debois
It's code but not as we know: Infrastructure as Code - Patrick DeboisIt's code but not as we know: Infrastructure as Code - Patrick Debois
It's code but not as we know: Infrastructure as Code - Patrick Debois
JAX London
 
Locks? We Don't Need No Stinkin' Locks - Michael Barker
Locks? We Don't Need No Stinkin' Locks - Michael BarkerLocks? We Don't Need No Stinkin' Locks - Michael Barker
Locks? We Don't Need No Stinkin' Locks - Michael Barker
JAX London
 
Worse is better, for better or for worse - Kevlin Henney
Worse is better, for better or for worse - Kevlin HenneyWorse is better, for better or for worse - Kevlin Henney
Worse is better, for better or for worse - Kevlin Henney
JAX London
 
Java performance: What's the big deal? - Trisha Gee
Java performance: What's the big deal? - Trisha GeeJava performance: What's the big deal? - Trisha Gee
Java performance: What's the big deal? - Trisha Gee
JAX London
 
Clojure made-simple - John Stevenson
Clojure made-simple - John StevensonClojure made-simple - John Stevenson
Clojure made-simple - John Stevenson
JAX London
 
HTML alchemy: the secrets of mixing JavaScript and Java EE - Matthias Wessendorf
HTML alchemy: the secrets of mixing JavaScript and Java EE - Matthias WessendorfHTML alchemy: the secrets of mixing JavaScript and Java EE - Matthias Wessendorf
HTML alchemy: the secrets of mixing JavaScript and Java EE - Matthias Wessendorf
JAX London
 
Play framework 2 : Peter Hilton
Play framework 2 : Peter HiltonPlay framework 2 : Peter Hilton
Play framework 2 : Peter Hilton
JAX London
 
Complexity theory and software development : Tim Berglund
Complexity theory and software development : Tim BerglundComplexity theory and software development : Tim Berglund
Complexity theory and software development : Tim Berglund
JAX London
 
Why FLOSS is a Java developer's best friend: Dave Gruber
Why FLOSS is a Java developer's best friend: Dave GruberWhy FLOSS is a Java developer's best friend: Dave Gruber
Why FLOSS is a Java developer's best friend: Dave Gruber
JAX London
 
Akka in Action: Heiko Seeburger
Akka in Action: Heiko SeeburgerAkka in Action: Heiko Seeburger
Akka in Action: Heiko Seeburger
JAX London
 
NoSQL Smackdown 2012 : Tim Berglund
NoSQL Smackdown 2012 : Tim BerglundNoSQL Smackdown 2012 : Tim Berglund
NoSQL Smackdown 2012 : Tim Berglund
JAX London
 
Closures, the next "Big Thing" in Java: Russel Winder
Closures, the next "Big Thing" in Java: Russel WinderClosures, the next "Big Thing" in Java: Russel Winder
Closures, the next "Big Thing" in Java: Russel Winder
JAX London
 
Java and the machine - Martijn Verburg and Kirk Pepperdine
Java and the machine - Martijn Verburg and Kirk PepperdineJava and the machine - Martijn Verburg and Kirk Pepperdine
Java and the machine - Martijn Verburg and Kirk Pepperdine
JAX London
 
Mongo DB on the JVM - Brendan McAdams
Mongo DB on the JVM - Brendan McAdamsMongo DB on the JVM - Brendan McAdams
Mongo DB on the JVM - Brendan McAdams
JAX London
 
New opportunities for connected data - Ian Robinson
New opportunities for connected data - Ian RobinsonNew opportunities for connected data - Ian Robinson
New opportunities for connected data - Ian Robinson
JAX London
 
HTML5 Websockets and Java - Arun Gupta
HTML5 Websockets and Java - Arun GuptaHTML5 Websockets and Java - Arun Gupta
HTML5 Websockets and Java - Arun Gupta
JAX London
 
The Big Data Con: Why Big Data is a Problem, not a Solution - Ian Plosker
The Big Data Con: Why Big Data is a Problem, not a Solution - Ian PloskerThe Big Data Con: Why Big Data is a Problem, not a Solution - Ian Plosker
The Big Data Con: Why Big Data is a Problem, not a Solution - Ian Plosker
JAX London
 
Everything I know about software in spaghetti bolognese: managing complexity
Everything I know about software in spaghetti bolognese: managing complexityEverything I know about software in spaghetti bolognese: managing complexity
Everything I know about software in spaghetti bolognese: managing complexity
JAX London
 
Devops with the S for Sharing - Patrick Debois
Devops with the S for Sharing - Patrick DeboisDevops with the S for Sharing - Patrick Debois
Devops with the S for Sharing - Patrick Debois
JAX London
 
Busy Developer's Guide to Windows 8 HTML/JavaScript Apps
Busy Developer's Guide to Windows 8 HTML/JavaScript AppsBusy Developer's Guide to Windows 8 HTML/JavaScript Apps
Busy Developer's Guide to Windows 8 HTML/JavaScript Apps
JAX London
 
It's code but not as we know: Infrastructure as Code - Patrick Debois
It's code but not as we know: Infrastructure as Code - Patrick DeboisIt's code but not as we know: Infrastructure as Code - Patrick Debois
It's code but not as we know: Infrastructure as Code - Patrick Debois
JAX London
 
Locks? We Don't Need No Stinkin' Locks - Michael Barker
Locks? We Don't Need No Stinkin' Locks - Michael BarkerLocks? We Don't Need No Stinkin' Locks - Michael Barker
Locks? We Don't Need No Stinkin' Locks - Michael Barker
JAX London
 
Worse is better, for better or for worse - Kevlin Henney
Worse is better, for better or for worse - Kevlin HenneyWorse is better, for better or for worse - Kevlin Henney
Worse is better, for better or for worse - Kevlin Henney
JAX London
 
Java performance: What's the big deal? - Trisha Gee
Java performance: What's the big deal? - Trisha GeeJava performance: What's the big deal? - Trisha Gee
Java performance: What's the big deal? - Trisha Gee
JAX London
 
Clojure made-simple - John Stevenson
Clojure made-simple - John StevensonClojure made-simple - John Stevenson
Clojure made-simple - John Stevenson
JAX London
 
HTML alchemy: the secrets of mixing JavaScript and Java EE - Matthias Wessendorf
HTML alchemy: the secrets of mixing JavaScript and Java EE - Matthias WessendorfHTML alchemy: the secrets of mixing JavaScript and Java EE - Matthias Wessendorf
HTML alchemy: the secrets of mixing JavaScript and Java EE - Matthias Wessendorf
JAX London
 
Play framework 2 : Peter Hilton
Play framework 2 : Peter HiltonPlay framework 2 : Peter Hilton
Play framework 2 : Peter Hilton
JAX London
 
Complexity theory and software development : Tim Berglund
Complexity theory and software development : Tim BerglundComplexity theory and software development : Tim Berglund
Complexity theory and software development : Tim Berglund
JAX London
 
Why FLOSS is a Java developer's best friend: Dave Gruber
Why FLOSS is a Java developer's best friend: Dave GruberWhy FLOSS is a Java developer's best friend: Dave Gruber
Why FLOSS is a Java developer's best friend: Dave Gruber
JAX London
 
Akka in Action: Heiko Seeburger
Akka in Action: Heiko SeeburgerAkka in Action: Heiko Seeburger
Akka in Action: Heiko Seeburger
JAX London
 
NoSQL Smackdown 2012 : Tim Berglund
NoSQL Smackdown 2012 : Tim BerglundNoSQL Smackdown 2012 : Tim Berglund
NoSQL Smackdown 2012 : Tim Berglund
JAX London
 
Closures, the next "Big Thing" in Java: Russel Winder
Closures, the next "Big Thing" in Java: Russel WinderClosures, the next "Big Thing" in Java: Russel Winder
Closures, the next "Big Thing" in Java: Russel Winder
JAX London
 
Java and the machine - Martijn Verburg and Kirk Pepperdine
Java and the machine - Martijn Verburg and Kirk PepperdineJava and the machine - Martijn Verburg and Kirk Pepperdine
Java and the machine - Martijn Verburg and Kirk Pepperdine
JAX London
 
Mongo DB on the JVM - Brendan McAdams
Mongo DB on the JVM - Brendan McAdamsMongo DB on the JVM - Brendan McAdams
Mongo DB on the JVM - Brendan McAdams
JAX London
 
New opportunities for connected data - Ian Robinson
New opportunities for connected data - Ian RobinsonNew opportunities for connected data - Ian Robinson
New opportunities for connected data - Ian Robinson
JAX London
 
HTML5 Websockets and Java - Arun Gupta
HTML5 Websockets and Java - Arun GuptaHTML5 Websockets and Java - Arun Gupta
HTML5 Websockets and Java - Arun Gupta
JAX London
 
The Big Data Con: Why Big Data is a Problem, not a Solution - Ian Plosker
The Big Data Con: Why Big Data is a Problem, not a Solution - Ian PloskerThe Big Data Con: Why Big Data is a Problem, not a Solution - Ian Plosker
The Big Data Con: Why Big Data is a Problem, not a Solution - Ian Plosker
JAX London
 

Recently uploaded (20)

Webinar - Top 5 Backup Mistakes MSPs and Businesses Make .pptx
Webinar - Top 5 Backup Mistakes MSPs and Businesses Make   .pptxWebinar - Top 5 Backup Mistakes MSPs and Businesses Make   .pptx
Webinar - Top 5 Backup Mistakes MSPs and Businesses Make .pptx
MSP360
 
The Microsoft Excel Parts Presentation.pdf
The Microsoft Excel Parts Presentation.pdfThe Microsoft Excel Parts Presentation.pdf
The Microsoft Excel Parts Presentation.pdf
YvonneRoseEranista
 
Build With AI - In Person Session Slides.pdf
Build With AI - In Person Session Slides.pdfBuild With AI - In Person Session Slides.pdf
Build With AI - In Person Session Slides.pdf
Google Developer Group - Harare
 
Bepents tech services - a premier cybersecurity consulting firm
Bepents tech services - a premier cybersecurity consulting firmBepents tech services - a premier cybersecurity consulting firm
Bepents tech services - a premier cybersecurity consulting firm
Benard76
 
Everything You Need to Know About Agentforce? (Put AI Agents to Work)
Everything You Need to Know About Agentforce? (Put AI Agents to Work)Everything You Need to Know About Agentforce? (Put AI Agents to Work)
Everything You Need to Know About Agentforce? (Put AI Agents to Work)
Cyntexa
 
Jignesh Shah - The Innovator and Czar of Exchanges
Jignesh Shah - The Innovator and Czar of ExchangesJignesh Shah - The Innovator and Czar of Exchanges
Jignesh Shah - The Innovator and Czar of Exchanges
Jignesh Shah Innovator
 
On-Device or Remote? On the Energy Efficiency of Fetching LLM-Generated Conte...
On-Device or Remote? On the Energy Efficiency of Fetching LLM-Generated Conte...On-Device or Remote? On the Energy Efficiency of Fetching LLM-Generated Conte...
On-Device or Remote? On the Energy Efficiency of Fetching LLM-Generated Conte...
Ivano Malavolta
 
Financial Services Technology Summit 2025
Financial Services Technology Summit 2025Financial Services Technology Summit 2025
Financial Services Technology Summit 2025
Ray Bugg
 
Cybersecurity Threat Vectors and Mitigation
Cybersecurity Threat Vectors and MitigationCybersecurity Threat Vectors and Mitigation
Cybersecurity Threat Vectors and Mitigation
VICTOR MAESTRE RAMIREZ
 
UiPath Agentic Automation: Community Developer Opportunities
UiPath Agentic Automation: Community Developer OpportunitiesUiPath Agentic Automation: Community Developer Opportunities
UiPath Agentic Automation: Community Developer Opportunities
DianaGray10
 
Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...
Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...
Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...
Raffi Khatchadourian
 
Web and Graphics Designing Training in Rajpura
Web and Graphics Designing Training in RajpuraWeb and Graphics Designing Training in Rajpura
Web and Graphics Designing Training in Rajpura
Erginous Technology
 
Agentic Automation - Delhi UiPath Community Meetup
Agentic Automation - Delhi UiPath Community MeetupAgentic Automation - Delhi UiPath Community Meetup
Agentic Automation - Delhi UiPath Community Meetup
Manoj Batra (1600 + Connections)
 
Does Pornify Allow NSFW? Everything You Should Know
Does Pornify Allow NSFW? Everything You Should KnowDoes Pornify Allow NSFW? Everything You Should Know
Does Pornify Allow NSFW? Everything You Should Know
Pornify CC
 
UiPath Automation Suite – Cas d'usage d'une NGO internationale basée à Genève
UiPath Automation Suite – Cas d'usage d'une NGO internationale basée à GenèveUiPath Automation Suite – Cas d'usage d'une NGO internationale basée à Genève
UiPath Automation Suite – Cas d'usage d'une NGO internationale basée à Genève
UiPathCommunity
 
Challenges in Migrating Imperative Deep Learning Programs to Graph Execution:...
Challenges in Migrating Imperative Deep Learning Programs to Graph Execution:...Challenges in Migrating Imperative Deep Learning Programs to Graph Execution:...
Challenges in Migrating Imperative Deep Learning Programs to Graph Execution:...
Raffi Khatchadourian
 
Mastering Testing in the Modern F&B Landscape
Mastering Testing in the Modern F&B LandscapeMastering Testing in the Modern F&B Landscape
Mastering Testing in the Modern F&B Landscape
marketing943205
 
Automate Studio Training: Building Scripts for SAP Fiori and GUI for HTML.pdf
Automate Studio Training: Building Scripts for SAP Fiori and GUI for HTML.pdfAutomate Studio Training: Building Scripts for SAP Fiori and GUI for HTML.pdf
Automate Studio Training: Building Scripts for SAP Fiori and GUI for HTML.pdf
Precisely
 
fennec fox optimization algorithm for optimal solution
fennec fox optimization algorithm for optimal solutionfennec fox optimization algorithm for optimal solution
fennec fox optimization algorithm for optimal solution
shallal2
 
Enterprise Integration Is Dead! Long Live AI-Driven Integration with Apache C...
Enterprise Integration Is Dead! Long Live AI-Driven Integration with Apache C...Enterprise Integration Is Dead! Long Live AI-Driven Integration with Apache C...
Enterprise Integration Is Dead! Long Live AI-Driven Integration with Apache C...
Markus Eisele
 
Webinar - Top 5 Backup Mistakes MSPs and Businesses Make .pptx
Webinar - Top 5 Backup Mistakes MSPs and Businesses Make   .pptxWebinar - Top 5 Backup Mistakes MSPs and Businesses Make   .pptx
Webinar - Top 5 Backup Mistakes MSPs and Businesses Make .pptx
MSP360
 
The Microsoft Excel Parts Presentation.pdf
The Microsoft Excel Parts Presentation.pdfThe Microsoft Excel Parts Presentation.pdf
The Microsoft Excel Parts Presentation.pdf
YvonneRoseEranista
 
Bepents tech services - a premier cybersecurity consulting firm
Bepents tech services - a premier cybersecurity consulting firmBepents tech services - a premier cybersecurity consulting firm
Bepents tech services - a premier cybersecurity consulting firm
Benard76
 
Everything You Need to Know About Agentforce? (Put AI Agents to Work)
Everything You Need to Know About Agentforce? (Put AI Agents to Work)Everything You Need to Know About Agentforce? (Put AI Agents to Work)
Everything You Need to Know About Agentforce? (Put AI Agents to Work)
Cyntexa
 
Jignesh Shah - The Innovator and Czar of Exchanges
Jignesh Shah - The Innovator and Czar of ExchangesJignesh Shah - The Innovator and Czar of Exchanges
Jignesh Shah - The Innovator and Czar of Exchanges
Jignesh Shah Innovator
 
On-Device or Remote? On the Energy Efficiency of Fetching LLM-Generated Conte...
On-Device or Remote? On the Energy Efficiency of Fetching LLM-Generated Conte...On-Device or Remote? On the Energy Efficiency of Fetching LLM-Generated Conte...
On-Device or Remote? On the Energy Efficiency of Fetching LLM-Generated Conte...
Ivano Malavolta
 
Financial Services Technology Summit 2025
Financial Services Technology Summit 2025Financial Services Technology Summit 2025
Financial Services Technology Summit 2025
Ray Bugg
 
Cybersecurity Threat Vectors and Mitigation
Cybersecurity Threat Vectors and MitigationCybersecurity Threat Vectors and Mitigation
Cybersecurity Threat Vectors and Mitigation
VICTOR MAESTRE RAMIREZ
 
UiPath Agentic Automation: Community Developer Opportunities
UiPath Agentic Automation: Community Developer OpportunitiesUiPath Agentic Automation: Community Developer Opportunities
UiPath Agentic Automation: Community Developer Opportunities
DianaGray10
 
Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...
Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...
Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...
Raffi Khatchadourian
 
Web and Graphics Designing Training in Rajpura
Web and Graphics Designing Training in RajpuraWeb and Graphics Designing Training in Rajpura
Web and Graphics Designing Training in Rajpura
Erginous Technology
 
Does Pornify Allow NSFW? Everything You Should Know
Does Pornify Allow NSFW? Everything You Should KnowDoes Pornify Allow NSFW? Everything You Should Know
Does Pornify Allow NSFW? Everything You Should Know
Pornify CC
 
UiPath Automation Suite – Cas d'usage d'une NGO internationale basée à Genève
UiPath Automation Suite – Cas d'usage d'une NGO internationale basée à GenèveUiPath Automation Suite – Cas d'usage d'une NGO internationale basée à Genève
UiPath Automation Suite – Cas d'usage d'une NGO internationale basée à Genève
UiPathCommunity
 
Challenges in Migrating Imperative Deep Learning Programs to Graph Execution:...
Challenges in Migrating Imperative Deep Learning Programs to Graph Execution:...Challenges in Migrating Imperative Deep Learning Programs to Graph Execution:...
Challenges in Migrating Imperative Deep Learning Programs to Graph Execution:...
Raffi Khatchadourian
 
Mastering Testing in the Modern F&B Landscape
Mastering Testing in the Modern F&B LandscapeMastering Testing in the Modern F&B Landscape
Mastering Testing in the Modern F&B Landscape
marketing943205
 
Automate Studio Training: Building Scripts for SAP Fiori and GUI for HTML.pdf
Automate Studio Training: Building Scripts for SAP Fiori and GUI for HTML.pdfAutomate Studio Training: Building Scripts for SAP Fiori and GUI for HTML.pdf
Automate Studio Training: Building Scripts for SAP Fiori and GUI for HTML.pdf
Precisely
 
fennec fox optimization algorithm for optimal solution
fennec fox optimization algorithm for optimal solutionfennec fox optimization algorithm for optimal solution
fennec fox optimization algorithm for optimal solution
shallal2
 
Enterprise Integration Is Dead! Long Live AI-Driven Integration with Apache C...
Enterprise Integration Is Dead! Long Live AI-Driven Integration with Apache C...Enterprise Integration Is Dead! Long Live AI-Driven Integration with Apache C...
Enterprise Integration Is Dead! Long Live AI-Driven Integration with Apache C...
Markus Eisele
 

Intro to HBase - Lars George

  • 1. HBASE – THE SCALABLE DATA STORE An Introduction to HBase JAX UK, October 2012 Lars George Director EMEA Services
  • 2. About Me •  Director EMEA Services @ Cloudera •  Consulting on Hadoop projects (everywhere) •  Apache Committer •  HBase and Whirr •  O’Reilly Author •  HBase – The Definitive Guide •  Now in Japanese! •  Contact •  lars@cloudera.com 日本語版も出ました!   •  @larsgeorge
  • 3. Agenda •  Introduction to HBase •  HBase Architecture •  MapReduce with HBase •  Advanced Techniques •  Current Project Status
  • 5. Why Hadoop/HBase? •  Datasets are constantly growing and intake soars •  Yahoo! has 140PB+ and 42k+ machines •  Facebook adds 500TB+ per day, 100PB+ raw data, on tens of thousands of machines •  Are you “throwing” data away today? •  Traditional databases are expensive to scale and inherently difficult to distribute •  Commodity hardware is cheap and powerful •  $1000 buys you 4-8 cores/4GB/1TB •  600GB 15k RPM SAS nearly $500 •  Need for random access and batch processing •  Hadoop only supports batch/streaming
  • 6. History of Hadoop/HBase •  Google solved its scalability problems •  “The Google File System” published October 2003 •  Hadoop DFS •  “MapReduce: Simplified Data Processing on Large Clusters” published December 2004 •  Hadoop MapReduce •  “BigTable: A Distributed Storage System for Structured Data” published November 2006 •  HBase
  • 7. Hadoop Introduction •  Two main components •  Hadoop Distributed File System (HDFS) •  A scalable, fault-tolerant, high performance distributed file system capable of running on commodity hardware •  Hadoop MapReduce •  Software framework for distributed computation •  Significant adoption •  Used in production in hundreds of organizations •  Primary contributors: Yahoo!, Facebook, Cloudera
  • 8. HDFS: Hadoop Distributed File System •  Reliably store petabytes of replicated data across thousands of nodes •  Data divided into 64MB blocks, each block replicated three times •  Master/Slave architecture •  Master NameNode contains block locations •  Slave DataNode manages block on local file system •  Built on commodity hardware •  No 15k RPM disks or RAID required (nor wanted!)
  • 9. MapReduce •  Distributed programming model to reliably process petabytes of data using its locality •  Built-in bindings for Java and C •  Can be used with any language via Hadoop Streaming •  Inspired by map and reduce functions in functional programming Input  è  Map()  è  Copy/Sort  è  Reduce()  è  Output    
  • 10. Hadoop… •  … is designed to store and stream extremely large datasets in batch •  … is not intended for realtime querying •  … does not support random access •  … does not handle billions of small files well •  Less than default block size of 64MB and smaller •  Keeps “inodes” in memory on master •  … is not supporting structured data more than unstructured or complex data That is why we have HBase!
  • 11. Why HBase and not …? •  Question: Why HBase and not <put-your-favorite- nosql-solution-here>? •  What else is there? •  Key/value stores •  Document-oriented stores •  Column-oriented stores •  Graph-oriented stores •  Features to ask for •  In memory or persistent? •  Strict or eventual consistency? •  Distributed or single machine (or afterthought)? •  Designed for read and/or write speeds? •  How does it scale? (if that is what you need)
  • 12. What is HBase? •  Distributed •  Column-Oriented •  Multi-Dimensional •  High-Availability (CAP anyone?) •  High-Performance •  Storage System Project Goals Billions of Rows * Millions of Columns * Thousands of Versions Petabytes across thousands of commodity servers
  • 13. HBase is not… •  An SQL Database •  No joins, no query engine, no types, no SQL •  Transactions and secondary indexes only as add-ons but immature •  A drop-in replacement for your RDBMS •  You must be OK with RDBMS anti-schema •  Denormalized data •  Wide and sparsely populated tables •  Just say “no” to your inner DBA Keyword: Impedance Match
  • 24. HBase Tables •  Tables are sorted by the Row Key in lexicographical order •  Table schema only defines its Column Families •  Each family consists of any number of Columns •  Each column consists of any number of Versions •  Columns only exist when inserted, NULLs are free •  Columns within a family are sorted and stored together •  Everything except table names are byte[] (Table, Row, Family:Column, Timestamp) è Value
  • 25. Column Family vs. Column •  Use only a few column families •  Causes many files that need to stay open per region plus class overhead per family •  Best used when logical separation between data and meta columns •  Sorting per family can be used to convey application logic or access pattern
  • 26. HBase Architecture •  Table is made up of any number if regions •  Region is specified by its startKey and endKey •  Empty table: (Table, NULL, NULL) •  Two-region table: (Table, NULL, “com.cloudera.www”) and (Table, “com.cloudera.www”, NULL) •  Each region may live on a different node and is made up of several HDFS files and blocks, each of which is replicated by Hadoop
  • 27. HBase Architecture (cont.) •  Two types of HBase nodes: Master and RegionServer •  Special tables -ROOT- and.META. store schema information and region locations •  Master server responsible for RegionServer monitoring as well as assignment and load balancing of regions •  Uses ZooKeeper as its distributed coordination service •  Manages Master election and server availability
  • 28. Web Crawl Example •  Canonical use-case for BigTable •  Store web crawl data •  Table webtable with family content and meta •  Row is reversed URL with Columns •  content:data stores the raw crawled data •  meta:language stores http language header •  meta:type stores http content-type header •  While processing raw data for hyperlinks and images, add families links and images •  links:<rurl> column for each hyperlink •  images:<rurl> column for each image
  • 29. HBase Clients •  Native Java Client/API •  Non-Java Clients •  REST server •  Avro server •  Thrift server •  Jython, Scala, Groovy DSL •  TableInputFormat/TableOutputFormat for MapReduce •  HBase as MapReduce source and/or target •  HBase Shell •  JRuby shell adding get, put, scan and admin calls
  • 30. Java API •  CRUD •  get: retrieve an entire, or partial row (R) •  put: create and update a row (CU) •  delete: delete a cell, column, columns, or row (D) Result get(Get get) throws IOException; void put(Put put) throws IOException; void delete(Delete delete) throws IOException;
  • 31. Java API (cont.) •  CRUD+SI •  scan: Scan any number of rows (S) •  increment: Increment a column value (I) ResultScanner getScanner(Scan scan) throws IOException; Result increment(Increment increment) throws IOException ;
  • 32. Java API (cont.) •  CRUD+SI+CAS •  Atomic compare-and-swap (CAS) •  Combined get, check, and put operation •  Helps to overcome lack of full transactions
  • 33. Batch Operations •  Support Get, Put, and Delete •  Reduce network round-trips •  If possible, batch operation to the server to gain better overall throughput void batch(List<Row> actions, Object[] results) throws IOException, InterruptedException; Object[] batch(List<Row> actions) throws IOException, InterruptedException;
  • 34. Filters •  Can be used with Get and Scan operations •  Server side hinting •  Reduce data transferred to client •  Filters are no guarantee for fast scans •  Still full table scan in worst-case scenario •  Might have to implement your own •  Filters can hint next row key
  • 35. HBase Extensions •  Hive, Pig, Cascading •  Hadoop-targeted MapReduce tools with HBase integration •  Sqoop •  Read and write to HBase for further processing in Hadoop •  HBase Explorer, Nutch, Heretrix •  SpringData •  Toad
  • 36. History of HBase •  November 2006 •  Google releases paper on BigTable •  February 2007 •  Initial HBase prototype created as Hadoop contrib •  October 2007 •  First “useable” HBase (Hadoop 0.15.0) •  January 2008 •  Hadoop becomes TLP, HBase becomes subproject •  October 2008 •  HBase 0.18.1 released •  January 2009 •  HBase 0.19.0 •  September 2009 •  HBase 0.20.0 released (Performance Release) •  May 2010 •  HBase becomes TLP •  June 2010 •  HBase 0.89.20100621, first developer release •  May 2011 •  HBase 0.90.3 release
  • 37. HBase Users •  Adobe •  eBay •  Facebook •  Mozilla (Socorro) •  Trend Micro (Advanced Threat Research) •  Twitter •  Yahoo! •  …
  • 41. HBase Architecture (cont.) •  Based on Log-Structured Merge-Trees (LSM-Trees) •  Inserts are done in write-ahead log first •  Data is stored in memory and flushed to disk on regular intervals or based on size •  Small flushes are merged in the background to keep number of files small •  Reads read memory stores first and then disk based files second •  Deletes are handled with “tombstone” markers •  Atomicity on row level no matter how many columns •  keeps locking model easy
  • 44. MapReduce with HBase •  Framework to use HBase as source and/or sink for MapReduce jobs •  Thin layer over native Java API •  Provides helper class to set up jobs easier TableMapReduceUtil.initTableMapperJob( “test”, scan, MyMapper.class, ImmutableBytesWritable.class, RowResult.class, job); TableMapReduceUtil.initTableReducerJob( “table”, MyReducer.class, job);
  • 45. MapReduce with HBase (cont.) •  Special use-case in regards to Hadoop •  Tables are sorted and have unique keys •  Often we do not need a Reducer phase •  Combiner not needed •  Need to make sure load is distributed properly by randomizing keys (or use bulk import) •  Partial or full table scans possible •  Scans are very efficient as they make use of block caches •  But then make sure you do not create to much churn, or better switch caching off when doing full table scans. •  Can use filters to limit rows being processed
  • 46. TableInputFormat •  Transforms a HBase table into a source for MapReduce jobs •  Internally uses a TableRecordReader which wraps a Scan instance •  Supports restarts to handle temporary issues •  Splits table by region boundaries and stores current region locality
  • 47. TableOutputFormat •  Allows to use HBase table as output target •  Put and Delete support from mapper or reducer class •  Uses TableOutputCommitter to write data •  Disables auto-commit on table to make use of client side write buffer •  Handles final flush in close()
  • 48. HFileOutputFormat •  Used to bulk load data into HBase •  Bypasses normal API and generates low-level store files •  Prepares files for final bulk insert •  Needs special handling of sort order and partitioning •  Only supports one column family (for now) •  Can load bulk updates into existing tables
  • 49. MapReduce Helper •  TableMapReduceUtil •  IdentityTableMapper •  Passes on key and value, where value is a Result instance and key is set to value.getRow() •  IdentityTableReducer •  Stores values into HBase, must be Put or Delete instances •  HRegionPartitioner •  Not set by default, use it to control partioning on Hadoop level
  • 50. Custom MapReduce over Tables •  No requirement to use provided framework •  Can read from or write to one or many tables in mapper and reducer •  Can split not on regions but arbitrary boundaries •  Make sure to use write buffer in OutputFormat to get best performance (do not forget to call flushCommits() at the end!)
  • 52. Advanced Techniques •  Key/Table Design •  DDI •  Salting •  Hashing vs. Sequential Keys •  ColumnFamily vs. Column •  Using BloomFilter •  Data Locality •  checkAndPut() and checkAndDelete() •  Coprocessors
  • 53. Coprocessors •  New addition to feature set •  Based on talk by Jeff Dean at LADIS 2009 •  Run arbitrary code on each region in RegionServer •  High level call interface for clients •  Calls are addressed to rows or ranges of rows while Coprocessors client library resolves locations •  Calls to multiple rows are atomically split •  Provides model for distributed services •  Automatic scaling, load balancing, request routing
  • 54. Coprocessors in HBase •  Use for efficient computational parallelism •  Secondary indexing (HBASE-2038) •  Column Aggregates (HBASE-1512) •  SQL-like sum(), avg(), max(), min(), etc. •  Access control (HBASE-3025, HBASE-3045) •  Provide basic access control •  Table Metacolumns •  New filtering •  predicate pushdown •  Table/Region access statistics •  HLog extensions (HBASE-3257)
  • 55. Coprocessor and RegionObserver •  The Coprocessor interface defines these hooks •  preOpen, postOpen: Called before and after the region is reported as online to the master •  preFlush, postFlush: Called before and after the memstore is flushed into a new store file •  preCompact, postCompact: Called before and after compaction •  preSplit, postSplit: Called after the region is split •  preClose, postClose: Called before and after the region is reported as closed to the master
  • 56. Coprocessor and RegionObserver •  The RegionObserver interface is defines these hooks: •  preGet, postGet: Called before and after a client makes a Get request •  preExists, postExists: Called before and after the client tests for existence using a Get •  prePut, postPut: Called before and after the client stores a value •  preDelete, postDelete: Called before and after the client deletes a value •  preScannerOpen, postScannerOpen: Called before and after the client opens a new scanner •  preScannerNext, postScannerNext: Called before and after the client asks for the next row on a scanner •  preScannerClose, postScannerClose: Called before and after the client closes a scanner •  preCheckAndPut, postCheckAndPut: Called before and after the client calls checkAndPut() •  preCheckAndDelete, postCheckAndDelete: Called before and after the client calls checkAndDelete()
  • 58. Current Project Status •  HBase 0.90.x “Advanced Concepts” •  Master Rewrite – More Zookeeper •  Intra Row Scanning •  Further optimizations on algorithms and data structures CDH3 •  HBase 0.92.x “Coprocessors” •  Multi-DC Replication •  Discretionary Access Control •  Coprocessors CDH4
  • 59. Current Project Status (cont.) •  HBase 0.94.x “Performance Release” •  Read CRC Improvements •  Seek Optimizations •  WAL Compression •  Prefix Compression (aka Block Encoding) •  Atomic Append •  Atomic put+delete •  Multi Increment and Multi Append •  Per-region (i.e. local) Multi-Row Transactions •  WALPlayer CDH4.x (soon)
  • 60. Current Project Status (cont.) •  HBase 0.96.x “The Singularity” •  Protobuf RPC •  Rolling Upgrades •  Multiversion Access •  Metrics V2 •  Preview Technologies •  Snapshots •  PrefixTrie Block Encoding CDH5 ?
  翻译: