SlideShare a Scribd company logo
HBase Sizing Notes
                                     Lars George
                         Director EMEA Services @ Cloudera
                                 lars@cloudera.com




Saturday, June 30, 12
Competing Resources

                        • Reads and Writes compete for the same
                          low-level resources
                         ‣ Disk (HDFS) and Network I/O
                         ‣ RPC Handlers and Threads
                        • Otherwise the do exercise completely
                          separate code paths


Saturday, June 30, 12
Memory Sharing
                        • By default every region server is dividing its
                          memory (i.e. given maximum heap) into
                          ‣ 40% for in-memory stores (write ops)
                          ‣ 20% for block caching (reads ops)
                          ‣ remaining space (here 40%) go towards
                            usual Java heap usage (objects etc.)
                        • Share of memory needs to be tweaked
Saturday, June 30, 12
Reads
                        • Locate and route request to appropriate
                          region server
                          ‣ Client caches information for faster
                            lookups ➜ consider prefetching option
                            for fast warmups
                        • Eliminate store files if possible using time
                          ranges or Bloom filter
                        • Try block cache, if block is missing then
                          load from disk

Saturday, June 30, 12
Block Cache
                        • Use exported metrics to see effectiveness
                          of block cache
                         ‣ Check fill and eviction rate, as well as hit
                            ratios ➜ random reads are not ideal
                        • Tweak up or down as needed, but watch
                          overall heap usage
                        • You absolutely need the block cache
                         ‣ Set to 10% at least for short term benefits
Saturday, June 30, 12
Writes
                        •   The cluster size is often determined by the
                            write performance
                        •   Log structured merge trees like
                            ‣   Store mutation in in-memory store and
                                write-ahead log
                            ‣   Flush out aggregated, sorted maps at specified
                                threshold - or - when under pressure
                            ‣   Discard logs with no pending edits
                            ‣   Perform regular compactions of store files

Saturday, June 30, 12
Write Performance
                        • There are many factors to the overall write
                          performance of a cluster
                          ‣ Key Distribution ➜ Avoid region hotspot
                          ‣ Handlers ➜ Do not pile up too early
                          ‣ Write-ahead log ➜ Bottleneck #1
                          ‣ Compactions ➜ Badly tuned can cause
                            ever increasing background noise

Saturday, June 30, 12
Write-Ahead Log
                        • Currently only one per region server
                         ‣ Shared across all stores (i.e. column
                            families)
                         ‣ Synchronized on file append calls
                        • Work being done on mitigating this
                         ‣ WAL Compression
                         ‣ Multiple WAL’s per region server ➜ Start
                            more than one region server per node?

Saturday, June 30, 12
Write-Ahead Log (cont.)
                        • Size set to 95% of default block size
                          ‣ 64MB or 128MB, but check config!
                        • Keep number low to reduce recovery time
                          ‣ Limit set to 32, but can be increased
                        • Increase size of logs - and/or - increase the
                          number of logs before blocking
                        • Compute number based on fill distribution
                          and flush frequencies

Saturday, June 30, 12
Write-Ahead Log (cont.)
                        • Writes are synchronized across all stores
                          ‣ A large cell in one family can stop all
                            writes of another
                          ‣ In this case the RPC handlers go binary,
                            i.e. either work or all block
                        • Can be bypassed on writes, but means no
                          real durability and no replication
                          ‣ Maybe use coprocessor to restore
                            dependent data sets (preWALRestore)
Saturday, June 30, 12
Flushes

                        • Every mutation call (put, delete etc.) causes
                          a check for a flush
                        • If threshold is met, flush file to disk and
                          schedule a compaction
                          ‣ Try to compact newly flushed files quickly
                        • The compaction returns - if necessary -
                          where a region should be split


Saturday, June 30, 12
Compaction Storms
                        • Premature flushing because of # of logs or
                          memory pressure
                         ‣ Files will be smaller than the configured
                            flush size
                        • The background compactions are hard at
                          work merging small flush files into the
                          existing, larger store files
                         ‣ Rewrite hundreds of MB over and over

Saturday, June 30, 12
Dependencies

                        • Flushes happen across all stores/column
                          families, even if just one triggers it
                        • The flush size is compared to the size of all
                          stores combined
                          ‣ Many column families dilute the size
                          ‣ Example: 55MB + 5MB + 4MB


Saturday, June 30, 12
Some Numbers
                        • Typical write performance of HDFS is
                          35-50MB/s
                               Cell Size               OPS
                                0.5MB                  70-100
                                100KB                 350-500
                                 10KB               3500-5000 ??
                                  1KB             35000-50000 ????

                        This is way to high in practice - Contention!

Saturday, June 30, 12
Some More Numbers
                        •   Under real world conditions the rate is less, more
                            like 15MB/s or less
                            ‣   Thread contention is cause for massive slow
                                down

                                   Cell Size                  OPS
                                    0.5MB                       10
                                    100KB                      100
                                     10KB                      800
                                      1KB                     6000


Saturday, June 30, 12
Notes
                        • Compute memstore sizes based on number
                          of regions x flush size
                        • Compute number of logs to keep based on
                          fill and flush rate
                        • Ultimately the capacity is driven by
                          ‣ Java Heap
                          ‣ Region Count and Size
                          ‣ Key Distribution
Saturday, June 30, 12
Cheat Sheet #1

                        • Ensure you have enough or large enough
                          write-ahead logs
                        • Ensure you do not oversubscribe available
                          memstore space
                        • Ensure to set flush size large enough but
                          not too large
                        • Check write-ahead log usage carefully
Saturday, June 30, 12
Cheat Sheet #2
                        • Enable compression to store more data per
                          node
                        • Tweak compaction algorithm to peg
                          background I/O at some level
                        • Consider putting uneven column families in
                          separate tables
                        • Check metrics carefully for block cache,
                          memstore, and all queues

Saturday, June 30, 12
Example
                        •   Java Xmx heap at 10GB
                        •   Memstore share at 40% (default)
                            ‣   10GB Heap x 0.4 = 4GB
                        •   Desired flush size at 128MB
                            ‣   4GB / 128MB = 32 regions max!
                        •   For WAL size of 128MB x 0.95%
                            ‣   4GB / (128MB x 0.95) = ~33 partially uncommitted
                                logs to keep around
                        •   Region size at 20GB
                            ‣   20GB x 32 regions = 640GB raw storage used

Saturday, June 30, 12
Ad

More Related Content

What's hot (20)

HBaseCon 2015: Multitenancy in HBase
HBaseCon 2015: Multitenancy in HBaseHBaseCon 2015: Multitenancy in HBase
HBaseCon 2015: Multitenancy in HBase
HBaseCon
 
HBase Application Performance Improvement
HBase Application Performance ImprovementHBase Application Performance Improvement
HBase Application Performance Improvement
Biju Nair
 
HBase 0.20.0 Performance Evaluation
HBase 0.20.0 Performance EvaluationHBase 0.20.0 Performance Evaluation
HBase 0.20.0 Performance Evaluation
Schubert Zhang
 
HBaseCon 2012 | Learning HBase Internals - Lars Hofhansl, Salesforce
HBaseCon 2012 | Learning HBase Internals - Lars Hofhansl, SalesforceHBaseCon 2012 | Learning HBase Internals - Lars Hofhansl, Salesforce
HBaseCon 2012 | Learning HBase Internals - Lars Hofhansl, Salesforce
Cloudera, Inc.
 
001 hbase introduction
001 hbase introduction001 hbase introduction
001 hbase introduction
Scott Miao
 
Hug Hbase Presentation.
Hug Hbase Presentation.Hug Hbase Presentation.
Hug Hbase Presentation.
Jack Levin
 
HBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ SalesforceHBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon
 
004 architecture andadvanceduse
004 architecture andadvanceduse004 architecture andadvanceduse
004 architecture andadvanceduse
Scott Miao
 
HBaseCon 2015: Elastic HBase on Mesos
HBaseCon 2015: Elastic HBase on MesosHBaseCon 2015: Elastic HBase on Mesos
HBaseCon 2015: Elastic HBase on Mesos
HBaseCon
 
HBase Accelerated: In-Memory Flush and Compaction
HBase Accelerated: In-Memory Flush and CompactionHBase Accelerated: In-Memory Flush and Compaction
HBase Accelerated: In-Memory Flush and Compaction
DataWorks Summit/Hadoop Summit
 
Meet HBase 1.0
Meet HBase 1.0Meet HBase 1.0
Meet HBase 1.0
enissoz
 
HBase Advanced - Lars George
HBase Advanced - Lars GeorgeHBase Advanced - Lars George
HBase Advanced - Lars George
JAX London
 
HBaseCon 2015: HBase at Scale in an Online and High-Demand Environment
HBaseCon 2015: HBase at Scale in an Online and  High-Demand EnvironmentHBaseCon 2015: HBase at Scale in an Online and  High-Demand Environment
HBaseCon 2015: HBase at Scale in an Online and High-Demand Environment
HBaseCon
 
Rigorous and Multi-tenant HBase Performance Measurement
Rigorous and Multi-tenant HBase Performance MeasurementRigorous and Multi-tenant HBase Performance Measurement
Rigorous and Multi-tenant HBase Performance Measurement
DataWorks Summit
 
HBase Blockcache 101
HBase Blockcache 101HBase Blockcache 101
HBase Blockcache 101
Nick Dimiduk
 
Hbase: an introduction
Hbase: an introductionHbase: an introduction
Hbase: an introduction
Jean-Baptiste Poullet
 
HBase: Extreme Makeover
HBase: Extreme MakeoverHBase: Extreme Makeover
HBase: Extreme Makeover
HBaseCon
 
Introduction to hadoop high availability
Introduction to hadoop high availability Introduction to hadoop high availability
Introduction to hadoop high availability
Omid Vahdaty
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
enissoz
 
Off-heaping the Apache HBase Read Path
Off-heaping the Apache HBase Read Path Off-heaping the Apache HBase Read Path
Off-heaping the Apache HBase Read Path
HBaseCon
 
HBaseCon 2015: Multitenancy in HBase
HBaseCon 2015: Multitenancy in HBaseHBaseCon 2015: Multitenancy in HBase
HBaseCon 2015: Multitenancy in HBase
HBaseCon
 
HBase Application Performance Improvement
HBase Application Performance ImprovementHBase Application Performance Improvement
HBase Application Performance Improvement
Biju Nair
 
HBase 0.20.0 Performance Evaluation
HBase 0.20.0 Performance EvaluationHBase 0.20.0 Performance Evaluation
HBase 0.20.0 Performance Evaluation
Schubert Zhang
 
HBaseCon 2012 | Learning HBase Internals - Lars Hofhansl, Salesforce
HBaseCon 2012 | Learning HBase Internals - Lars Hofhansl, SalesforceHBaseCon 2012 | Learning HBase Internals - Lars Hofhansl, Salesforce
HBaseCon 2012 | Learning HBase Internals - Lars Hofhansl, Salesforce
Cloudera, Inc.
 
001 hbase introduction
001 hbase introduction001 hbase introduction
001 hbase introduction
Scott Miao
 
Hug Hbase Presentation.
Hug Hbase Presentation.Hug Hbase Presentation.
Hug Hbase Presentation.
Jack Levin
 
HBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ SalesforceHBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon
 
004 architecture andadvanceduse
004 architecture andadvanceduse004 architecture andadvanceduse
004 architecture andadvanceduse
Scott Miao
 
HBaseCon 2015: Elastic HBase on Mesos
HBaseCon 2015: Elastic HBase on MesosHBaseCon 2015: Elastic HBase on Mesos
HBaseCon 2015: Elastic HBase on Mesos
HBaseCon
 
Meet HBase 1.0
Meet HBase 1.0Meet HBase 1.0
Meet HBase 1.0
enissoz
 
HBase Advanced - Lars George
HBase Advanced - Lars GeorgeHBase Advanced - Lars George
HBase Advanced - Lars George
JAX London
 
HBaseCon 2015: HBase at Scale in an Online and High-Demand Environment
HBaseCon 2015: HBase at Scale in an Online and  High-Demand EnvironmentHBaseCon 2015: HBase at Scale in an Online and  High-Demand Environment
HBaseCon 2015: HBase at Scale in an Online and High-Demand Environment
HBaseCon
 
Rigorous and Multi-tenant HBase Performance Measurement
Rigorous and Multi-tenant HBase Performance MeasurementRigorous and Multi-tenant HBase Performance Measurement
Rigorous and Multi-tenant HBase Performance Measurement
DataWorks Summit
 
HBase Blockcache 101
HBase Blockcache 101HBase Blockcache 101
HBase Blockcache 101
Nick Dimiduk
 
HBase: Extreme Makeover
HBase: Extreme MakeoverHBase: Extreme Makeover
HBase: Extreme Makeover
HBaseCon
 
Introduction to hadoop high availability
Introduction to hadoop high availability Introduction to hadoop high availability
Introduction to hadoop high availability
Omid Vahdaty
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
enissoz
 
Off-heaping the Apache HBase Read Path
Off-heaping the Apache HBase Read Path Off-heaping the Apache HBase Read Path
Off-heaping the Apache HBase Read Path
HBaseCon
 

Viewers also liked (20)

HBase Operations and Best Practices
HBase Operations and Best PracticesHBase Operations and Best Practices
HBase Operations and Best Practices
Venu Anuganti
 
Hbase at Salesforce.com
Hbase at Salesforce.comHbase at Salesforce.com
Hbase at Salesforce.com
Salesforce Engineering
 
Designing Scalable Data Warehouse Using MySQL
Designing Scalable Data Warehouse Using MySQLDesigning Scalable Data Warehouse Using MySQL
Designing Scalable Data Warehouse Using MySQL
Venu Anuganti
 
Maste Thesis Ap Thiago Assis
Maste Thesis Ap Thiago AssisMaste Thesis Ap Thiago Assis
Maste Thesis Ap Thiago Assis
Thiago Assis
 
iOS 上 self-sizing cell 的過去、現在、與未來
iOS 上 self-sizing cell 的過去、現在、與未來iOS 上 self-sizing cell 的過去、現在、與未來
iOS 上 self-sizing cell 的過去、現在、與未來
Jeff Lin
 
Spark Streaming Data Pipelines
Spark Streaming Data PipelinesSpark Streaming Data Pipelines
Spark Streaming Data Pipelines
MapR Technologies
 
Social Networks and the Richness of Data
Social Networks and the Richness of DataSocial Networks and the Richness of Data
Social Networks and the Richness of Data
larsgeorge
 
Ysance conference - cloud computing - aws - 3 mai 2010
Ysance   conference - cloud computing - aws - 3 mai 2010Ysance   conference - cloud computing - aws - 3 mai 2010
Ysance conference - cloud computing - aws - 3 mai 2010
Ysance
 
Hadoop unit
Hadoop unitHadoop unit
Hadoop unit
Khanh Maudoux
 
See who is using MemSQL
See who is using MemSQLSee who is using MemSQL
See who is using MemSQL
jenjermain
 
Hadoop is dead - long live Hadoop | BiDaTA 2013 Genoa
Hadoop is dead - long live Hadoop | BiDaTA 2013 GenoaHadoop is dead - long live Hadoop | BiDaTA 2013 Genoa
Hadoop is dead - long live Hadoop | BiDaTA 2013 Genoa
larsgeorge
 
From Batch to Realtime with Hadoop - Berlin Buzzwords - June 2012
From Batch to Realtime with Hadoop - Berlin Buzzwords - June 2012From Batch to Realtime with Hadoop - Berlin Buzzwords - June 2012
From Batch to Realtime with Hadoop - Berlin Buzzwords - June 2012
larsgeorge
 
Introduction sur les problématiques d'une architecture distribuée
Introduction sur les problématiques d'une architecture distribuéeIntroduction sur les problématiques d'une architecture distribuée
Introduction sur les problématiques d'une architecture distribuée
Khanh Maudoux
 
Présentation Club STORM
Présentation Club STORMPrésentation Club STORM
Présentation Club STORM
Forum Education Science Culture
 
Big Data is not Rocket Science
Big Data is not Rocket ScienceBig Data is not Rocket Science
Big Data is not Rocket Science
larsgeorge
 
Phoenix - A High Performance Open Source SQL Layer over HBase
Phoenix - A High Performance Open Source SQL Layer over HBasePhoenix - A High Performance Open Source SQL Layer over HBase
Phoenix - A High Performance Open Source SQL Layer over HBase
Salesforce Developers
 
Tech day hadoop, Spark
Tech day hadoop, SparkTech day hadoop, Spark
Tech day hadoop, Spark
Arrow-Institute
 
HBase and Impala Notes - Munich HUG - 20131017
HBase and Impala Notes - Munich HUG - 20131017HBase and Impala Notes - Munich HUG - 20131017
HBase and Impala Notes - Munich HUG - 20131017
larsgeorge
 
HBase Advanced Schema Design - Berlin Buzzwords - June 2012
HBase Advanced Schema Design - Berlin Buzzwords - June 2012HBase Advanced Schema Design - Berlin Buzzwords - June 2012
HBase Advanced Schema Design - Berlin Buzzwords - June 2012
larsgeorge
 
MapR-DB – The First In-Hadoop Document Database
MapR-DB – The First In-Hadoop Document DatabaseMapR-DB – The First In-Hadoop Document Database
MapR-DB – The First In-Hadoop Document Database
MapR Technologies
 
HBase Operations and Best Practices
HBase Operations and Best PracticesHBase Operations and Best Practices
HBase Operations and Best Practices
Venu Anuganti
 
Designing Scalable Data Warehouse Using MySQL
Designing Scalable Data Warehouse Using MySQLDesigning Scalable Data Warehouse Using MySQL
Designing Scalable Data Warehouse Using MySQL
Venu Anuganti
 
Maste Thesis Ap Thiago Assis
Maste Thesis Ap Thiago AssisMaste Thesis Ap Thiago Assis
Maste Thesis Ap Thiago Assis
Thiago Assis
 
iOS 上 self-sizing cell 的過去、現在、與未來
iOS 上 self-sizing cell 的過去、現在、與未來iOS 上 self-sizing cell 的過去、現在、與未來
iOS 上 self-sizing cell 的過去、現在、與未來
Jeff Lin
 
Spark Streaming Data Pipelines
Spark Streaming Data PipelinesSpark Streaming Data Pipelines
Spark Streaming Data Pipelines
MapR Technologies
 
Social Networks and the Richness of Data
Social Networks and the Richness of DataSocial Networks and the Richness of Data
Social Networks and the Richness of Data
larsgeorge
 
Ysance conference - cloud computing - aws - 3 mai 2010
Ysance   conference - cloud computing - aws - 3 mai 2010Ysance   conference - cloud computing - aws - 3 mai 2010
Ysance conference - cloud computing - aws - 3 mai 2010
Ysance
 
See who is using MemSQL
See who is using MemSQLSee who is using MemSQL
See who is using MemSQL
jenjermain
 
Hadoop is dead - long live Hadoop | BiDaTA 2013 Genoa
Hadoop is dead - long live Hadoop | BiDaTA 2013 GenoaHadoop is dead - long live Hadoop | BiDaTA 2013 Genoa
Hadoop is dead - long live Hadoop | BiDaTA 2013 Genoa
larsgeorge
 
From Batch to Realtime with Hadoop - Berlin Buzzwords - June 2012
From Batch to Realtime with Hadoop - Berlin Buzzwords - June 2012From Batch to Realtime with Hadoop - Berlin Buzzwords - June 2012
From Batch to Realtime with Hadoop - Berlin Buzzwords - June 2012
larsgeorge
 
Introduction sur les problématiques d'une architecture distribuée
Introduction sur les problématiques d'une architecture distribuéeIntroduction sur les problématiques d'une architecture distribuée
Introduction sur les problématiques d'une architecture distribuée
Khanh Maudoux
 
Big Data is not Rocket Science
Big Data is not Rocket ScienceBig Data is not Rocket Science
Big Data is not Rocket Science
larsgeorge
 
Phoenix - A High Performance Open Source SQL Layer over HBase
Phoenix - A High Performance Open Source SQL Layer over HBasePhoenix - A High Performance Open Source SQL Layer over HBase
Phoenix - A High Performance Open Source SQL Layer over HBase
Salesforce Developers
 
HBase and Impala Notes - Munich HUG - 20131017
HBase and Impala Notes - Munich HUG - 20131017HBase and Impala Notes - Munich HUG - 20131017
HBase and Impala Notes - Munich HUG - 20131017
larsgeorge
 
HBase Advanced Schema Design - Berlin Buzzwords - June 2012
HBase Advanced Schema Design - Berlin Buzzwords - June 2012HBase Advanced Schema Design - Berlin Buzzwords - June 2012
HBase Advanced Schema Design - Berlin Buzzwords - June 2012
larsgeorge
 
MapR-DB – The First In-Hadoop Document Database
MapR-DB – The First In-Hadoop Document DatabaseMapR-DB – The First In-Hadoop Document Database
MapR-DB – The First In-Hadoop Document Database
MapR Technologies
 
Ad

Similar to HBase Sizing Notes (20)

HBase Sizing Notes
HBase Sizing NotesHBase Sizing Notes
HBase Sizing Notes
DataWorks Summit
 
Five steps perform_2009 (1)
Five steps perform_2009 (1)Five steps perform_2009 (1)
Five steps perform_2009 (1)
PostgreSQL Experts, Inc.
 
5 Steps to PostgreSQL Performance
5 Steps to PostgreSQL Performance5 Steps to PostgreSQL Performance
5 Steps to PostgreSQL Performance
Command Prompt., Inc
 
Practical ,Transparent Operating System Support For Superpages
Practical ,Transparent Operating System Support For SuperpagesPractical ,Transparent Operating System Support For Superpages
Practical ,Transparent Operating System Support For Superpages
Nadeeshani Hewage
 
Kafka on ZFS: Better Living Through Filesystems
Kafka on ZFS: Better Living Through Filesystems Kafka on ZFS: Better Living Through Filesystems
Kafka on ZFS: Better Living Through Filesystems
confluent
 
strangeloop 2012 apache cassandra anti patterns
strangeloop 2012 apache cassandra anti patternsstrangeloop 2012 apache cassandra anti patterns
strangeloop 2012 apache cassandra anti patterns
Matthew Dennis
 
Modeling, estimating, and predicting Ceph (Linux Foundation - Vault 2015)
Modeling, estimating, and predicting Ceph (Linux Foundation - Vault 2015)Modeling, estimating, and predicting Ceph (Linux Foundation - Vault 2015)
Modeling, estimating, and predicting Ceph (Linux Foundation - Vault 2015)
Lars Marowsky-Brée
 
Ocfs2 storage
Ocfs2 storageOcfs2 storage
Ocfs2 storage
Timothy Krupinski
 
Backing up Wikipedia Databases
Backing up Wikipedia DatabasesBacking up Wikipedia Databases
Backing up Wikipedia Databases
Jaime Crespo
 
Tips and Tricks for SAP Sybase IQ
Tips and Tricks for SAP  Sybase IQTips and Tricks for SAP  Sybase IQ
Tips and Tricks for SAP Sybase IQ
Don Brizendine
 
Retaining Goodput with Query Rate Limiting
Retaining Goodput with Query Rate LimitingRetaining Goodput with Query Rate Limiting
Retaining Goodput with Query Rate Limiting
ScyllaDB
 
Linux internals for Database administrators at Linux Piter 2016
Linux internals for Database administrators at Linux Piter 2016Linux internals for Database administrators at Linux Piter 2016
Linux internals for Database administrators at Linux Piter 2016
PostgreSQL-Consulting
 
PGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar Ahmed
PGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar AhmedPGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar Ahmed
PGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar Ahmed
Equnix Business Solutions
 
ZFS by PWR 2013
ZFS by PWR 2013ZFS by PWR 2013
ZFS by PWR 2013
pwrsoft
 
Ceph Day Amsterdam 2015: Measuring and predicting performance of Ceph clusters
Ceph Day Amsterdam 2015: Measuring and predicting performance of Ceph clusters Ceph Day Amsterdam 2015: Measuring and predicting performance of Ceph clusters
Ceph Day Amsterdam 2015: Measuring and predicting performance of Ceph clusters
Ceph Community
 
Controlling Memory Footprint at All Layers: Linux Kernel, Applications, Libra...
Controlling Memory Footprint at All Layers: Linux Kernel, Applications, Libra...Controlling Memory Footprint at All Layers: Linux Kernel, Applications, Libra...
Controlling Memory Footprint at All Layers: Linux Kernel, Applications, Libra...
peknap
 
Designing data intensive applications
Designing data intensive applicationsDesigning data intensive applications
Designing data intensive applications
Hemchander Sannidhanam
 
In-memory Data Management Trends & Techniques
In-memory Data Management Trends & TechniquesIn-memory Data Management Trends & Techniques
In-memory Data Management Trends & Techniques
Hazelcast
 
MySQL 高可用性
MySQL 高可用性MySQL 高可用性
MySQL 高可用性
YUCHENG HU
 
VMworld 2013: Just Because You Could, Doesn't Mean You Should: Lessons Learne...
VMworld 2013: Just Because You Could, Doesn't Mean You Should: Lessons Learne...VMworld 2013: Just Because You Could, Doesn't Mean You Should: Lessons Learne...
VMworld 2013: Just Because You Could, Doesn't Mean You Should: Lessons Learne...
VMworld
 
Practical ,Transparent Operating System Support For Superpages
Practical ,Transparent Operating System Support For SuperpagesPractical ,Transparent Operating System Support For Superpages
Practical ,Transparent Operating System Support For Superpages
Nadeeshani Hewage
 
Kafka on ZFS: Better Living Through Filesystems
Kafka on ZFS: Better Living Through Filesystems Kafka on ZFS: Better Living Through Filesystems
Kafka on ZFS: Better Living Through Filesystems
confluent
 
strangeloop 2012 apache cassandra anti patterns
strangeloop 2012 apache cassandra anti patternsstrangeloop 2012 apache cassandra anti patterns
strangeloop 2012 apache cassandra anti patterns
Matthew Dennis
 
Modeling, estimating, and predicting Ceph (Linux Foundation - Vault 2015)
Modeling, estimating, and predicting Ceph (Linux Foundation - Vault 2015)Modeling, estimating, and predicting Ceph (Linux Foundation - Vault 2015)
Modeling, estimating, and predicting Ceph (Linux Foundation - Vault 2015)
Lars Marowsky-Brée
 
Backing up Wikipedia Databases
Backing up Wikipedia DatabasesBacking up Wikipedia Databases
Backing up Wikipedia Databases
Jaime Crespo
 
Tips and Tricks for SAP Sybase IQ
Tips and Tricks for SAP  Sybase IQTips and Tricks for SAP  Sybase IQ
Tips and Tricks for SAP Sybase IQ
Don Brizendine
 
Retaining Goodput with Query Rate Limiting
Retaining Goodput with Query Rate LimitingRetaining Goodput with Query Rate Limiting
Retaining Goodput with Query Rate Limiting
ScyllaDB
 
Linux internals for Database administrators at Linux Piter 2016
Linux internals for Database administrators at Linux Piter 2016Linux internals for Database administrators at Linux Piter 2016
Linux internals for Database administrators at Linux Piter 2016
PostgreSQL-Consulting
 
PGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar Ahmed
PGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar AhmedPGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar Ahmed
PGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar Ahmed
Equnix Business Solutions
 
ZFS by PWR 2013
ZFS by PWR 2013ZFS by PWR 2013
ZFS by PWR 2013
pwrsoft
 
Ceph Day Amsterdam 2015: Measuring and predicting performance of Ceph clusters
Ceph Day Amsterdam 2015: Measuring and predicting performance of Ceph clusters Ceph Day Amsterdam 2015: Measuring and predicting performance of Ceph clusters
Ceph Day Amsterdam 2015: Measuring and predicting performance of Ceph clusters
Ceph Community
 
Controlling Memory Footprint at All Layers: Linux Kernel, Applications, Libra...
Controlling Memory Footprint at All Layers: Linux Kernel, Applications, Libra...Controlling Memory Footprint at All Layers: Linux Kernel, Applications, Libra...
Controlling Memory Footprint at All Layers: Linux Kernel, Applications, Libra...
peknap
 
In-memory Data Management Trends & Techniques
In-memory Data Management Trends & TechniquesIn-memory Data Management Trends & Techniques
In-memory Data Management Trends & Techniques
Hazelcast
 
MySQL 高可用性
MySQL 高可用性MySQL 高可用性
MySQL 高可用性
YUCHENG HU
 
VMworld 2013: Just Because You Could, Doesn't Mean You Should: Lessons Learne...
VMworld 2013: Just Because You Could, Doesn't Mean You Should: Lessons Learne...VMworld 2013: Just Because You Could, Doesn't Mean You Should: Lessons Learne...
VMworld 2013: Just Because You Could, Doesn't Mean You Should: Lessons Learne...
VMworld
 
Ad

More from larsgeorge (6)

HBase in Practice
HBase in PracticeHBase in Practice
HBase in Practice
larsgeorge
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in HadoopBackup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
larsgeorge
 
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
larsgeorge
 
HBase Status Report - Hadoop Summit Europe 2014
HBase Status Report - Hadoop Summit Europe 2014HBase Status Report - Hadoop Summit Europe 2014
HBase Status Report - Hadoop Summit Europe 2014
larsgeorge
 
Parquet - Data I/O - Philadelphia 2013
Parquet - Data I/O - Philadelphia 2013Parquet - Data I/O - Philadelphia 2013
Parquet - Data I/O - Philadelphia 2013
larsgeorge
 
Realtime Analytics with Hadoop and HBase
Realtime Analytics with Hadoop and HBaseRealtime Analytics with Hadoop and HBase
Realtime Analytics with Hadoop and HBase
larsgeorge
 
HBase in Practice
HBase in PracticeHBase in Practice
HBase in Practice
larsgeorge
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in HadoopBackup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
larsgeorge
 
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
larsgeorge
 
HBase Status Report - Hadoop Summit Europe 2014
HBase Status Report - Hadoop Summit Europe 2014HBase Status Report - Hadoop Summit Europe 2014
HBase Status Report - Hadoop Summit Europe 2014
larsgeorge
 
Parquet - Data I/O - Philadelphia 2013
Parquet - Data I/O - Philadelphia 2013Parquet - Data I/O - Philadelphia 2013
Parquet - Data I/O - Philadelphia 2013
larsgeorge
 
Realtime Analytics with Hadoop and HBase
Realtime Analytics with Hadoop and HBaseRealtime Analytics with Hadoop and HBase
Realtime Analytics with Hadoop and HBase
larsgeorge
 

Recently uploaded (20)

Cybersecurity Threat Vectors and Mitigation
Cybersecurity Threat Vectors and MitigationCybersecurity Threat Vectors and Mitigation
Cybersecurity Threat Vectors and Mitigation
VICTOR MAESTRE RAMIREZ
 
AI Agents at Work: UiPath, Maestro & the Future of Documents
AI Agents at Work: UiPath, Maestro & the Future of DocumentsAI Agents at Work: UiPath, Maestro & the Future of Documents
AI Agents at Work: UiPath, Maestro & the Future of Documents
UiPathCommunity
 
UiPath Automation Suite – Cas d'usage d'une NGO internationale basée à Genève
UiPath Automation Suite – Cas d'usage d'une NGO internationale basée à GenèveUiPath Automation Suite – Cas d'usage d'une NGO internationale basée à Genève
UiPath Automation Suite – Cas d'usage d'une NGO internationale basée à Genève
UiPathCommunity
 
The Changing Compliance Landscape in 2025.pdf
The Changing Compliance Landscape in 2025.pdfThe Changing Compliance Landscape in 2025.pdf
The Changing Compliance Landscape in 2025.pdf
Precisely
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
UiPath Agentic Automation: Community Developer Opportunities
UiPath Agentic Automation: Community Developer OpportunitiesUiPath Agentic Automation: Community Developer Opportunities
UiPath Agentic Automation: Community Developer Opportunities
DianaGray10
 
Viam product demo_ Deploying and scaling AI with hardware.pdf
Viam product demo_ Deploying and scaling AI with hardware.pdfViam product demo_ Deploying and scaling AI with hardware.pdf
Viam product demo_ Deploying and scaling AI with hardware.pdf
camilalamoratta
 
Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...
Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...
Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...
Raffi Khatchadourian
 
Does Pornify Allow NSFW? Everything You Should Know
Does Pornify Allow NSFW? Everything You Should KnowDoes Pornify Allow NSFW? Everything You Should Know
Does Pornify Allow NSFW? Everything You Should Know
Pornify CC
 
Automate Studio Training: Building Scripts for SAP Fiori and GUI for HTML.pdf
Automate Studio Training: Building Scripts for SAP Fiori and GUI for HTML.pdfAutomate Studio Training: Building Scripts for SAP Fiori and GUI for HTML.pdf
Automate Studio Training: Building Scripts for SAP Fiori and GUI for HTML.pdf
Precisely
 
On-Device or Remote? On the Energy Efficiency of Fetching LLM-Generated Conte...
On-Device or Remote? On the Energy Efficiency of Fetching LLM-Generated Conte...On-Device or Remote? On the Energy Efficiency of Fetching LLM-Generated Conte...
On-Device or Remote? On the Energy Efficiency of Fetching LLM-Generated Conte...
Ivano Malavolta
 
Web and Graphics Designing Training in Rajpura
Web and Graphics Designing Training in RajpuraWeb and Graphics Designing Training in Rajpura
Web and Graphics Designing Training in Rajpura
Erginous Technology
 
IT484 Cyber Forensics_Information Technology
IT484 Cyber Forensics_Information TechnologyIT484 Cyber Forensics_Information Technology
IT484 Cyber Forensics_Information Technology
SHEHABALYAMANI
 
DevOpsDays SLC - Platform Engineers are Product Managers.pptx
DevOpsDays SLC - Platform Engineers are Product Managers.pptxDevOpsDays SLC - Platform Engineers are Product Managers.pptx
DevOpsDays SLC - Platform Engineers are Product Managers.pptx
Justin Reock
 
MINDCTI revenue release Quarter 1 2025 PR
MINDCTI revenue release Quarter 1 2025 PRMINDCTI revenue release Quarter 1 2025 PR
MINDCTI revenue release Quarter 1 2025 PR
MIND CTI
 
Integrating FME with Python: Tips, Demos, and Best Practices for Powerful Aut...
Integrating FME with Python: Tips, Demos, and Best Practices for Powerful Aut...Integrating FME with Python: Tips, Demos, and Best Practices for Powerful Aut...
Integrating FME with Python: Tips, Demos, and Best Practices for Powerful Aut...
Safe Software
 
Webinar - Top 5 Backup Mistakes MSPs and Businesses Make .pptx
Webinar - Top 5 Backup Mistakes MSPs and Businesses Make   .pptxWebinar - Top 5 Backup Mistakes MSPs and Businesses Make   .pptx
Webinar - Top 5 Backup Mistakes MSPs and Businesses Make .pptx
MSP360
 
GDG Cloud Southlake #42: Suresh Mathew: Autonomous Resource Optimization: How...
GDG Cloud Southlake #42: Suresh Mathew: Autonomous Resource Optimization: How...GDG Cloud Southlake #42: Suresh Mathew: Autonomous Resource Optimization: How...
GDG Cloud Southlake #42: Suresh Mathew: Autonomous Resource Optimization: How...
James Anderson
 
AI 3-in-1: Agents, RAG, and Local Models - Brent Laster
AI 3-in-1: Agents, RAG, and Local Models - Brent LasterAI 3-in-1: Agents, RAG, and Local Models - Brent Laster
AI 3-in-1: Agents, RAG, and Local Models - Brent Laster
All Things Open
 
Q1 2025 Dropbox Earnings and Investor Presentation
Q1 2025 Dropbox Earnings and Investor PresentationQ1 2025 Dropbox Earnings and Investor Presentation
Q1 2025 Dropbox Earnings and Investor Presentation
Dropbox
 
Cybersecurity Threat Vectors and Mitigation
Cybersecurity Threat Vectors and MitigationCybersecurity Threat Vectors and Mitigation
Cybersecurity Threat Vectors and Mitigation
VICTOR MAESTRE RAMIREZ
 
AI Agents at Work: UiPath, Maestro & the Future of Documents
AI Agents at Work: UiPath, Maestro & the Future of DocumentsAI Agents at Work: UiPath, Maestro & the Future of Documents
AI Agents at Work: UiPath, Maestro & the Future of Documents
UiPathCommunity
 
UiPath Automation Suite – Cas d'usage d'une NGO internationale basée à Genève
UiPath Automation Suite – Cas d'usage d'une NGO internationale basée à GenèveUiPath Automation Suite – Cas d'usage d'une NGO internationale basée à Genève
UiPath Automation Suite – Cas d'usage d'une NGO internationale basée à Genève
UiPathCommunity
 
The Changing Compliance Landscape in 2025.pdf
The Changing Compliance Landscape in 2025.pdfThe Changing Compliance Landscape in 2025.pdf
The Changing Compliance Landscape in 2025.pdf
Precisely
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
UiPath Agentic Automation: Community Developer Opportunities
UiPath Agentic Automation: Community Developer OpportunitiesUiPath Agentic Automation: Community Developer Opportunities
UiPath Agentic Automation: Community Developer Opportunities
DianaGray10
 
Viam product demo_ Deploying and scaling AI with hardware.pdf
Viam product demo_ Deploying and scaling AI with hardware.pdfViam product demo_ Deploying and scaling AI with hardware.pdf
Viam product demo_ Deploying and scaling AI with hardware.pdf
camilalamoratta
 
Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...
Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...
Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...
Raffi Khatchadourian
 
Does Pornify Allow NSFW? Everything You Should Know
Does Pornify Allow NSFW? Everything You Should KnowDoes Pornify Allow NSFW? Everything You Should Know
Does Pornify Allow NSFW? Everything You Should Know
Pornify CC
 
Automate Studio Training: Building Scripts for SAP Fiori and GUI for HTML.pdf
Automate Studio Training: Building Scripts for SAP Fiori and GUI for HTML.pdfAutomate Studio Training: Building Scripts for SAP Fiori and GUI for HTML.pdf
Automate Studio Training: Building Scripts for SAP Fiori and GUI for HTML.pdf
Precisely
 
On-Device or Remote? On the Energy Efficiency of Fetching LLM-Generated Conte...
On-Device or Remote? On the Energy Efficiency of Fetching LLM-Generated Conte...On-Device or Remote? On the Energy Efficiency of Fetching LLM-Generated Conte...
On-Device or Remote? On the Energy Efficiency of Fetching LLM-Generated Conte...
Ivano Malavolta
 
Web and Graphics Designing Training in Rajpura
Web and Graphics Designing Training in RajpuraWeb and Graphics Designing Training in Rajpura
Web and Graphics Designing Training in Rajpura
Erginous Technology
 
IT484 Cyber Forensics_Information Technology
IT484 Cyber Forensics_Information TechnologyIT484 Cyber Forensics_Information Technology
IT484 Cyber Forensics_Information Technology
SHEHABALYAMANI
 
DevOpsDays SLC - Platform Engineers are Product Managers.pptx
DevOpsDays SLC - Platform Engineers are Product Managers.pptxDevOpsDays SLC - Platform Engineers are Product Managers.pptx
DevOpsDays SLC - Platform Engineers are Product Managers.pptx
Justin Reock
 
MINDCTI revenue release Quarter 1 2025 PR
MINDCTI revenue release Quarter 1 2025 PRMINDCTI revenue release Quarter 1 2025 PR
MINDCTI revenue release Quarter 1 2025 PR
MIND CTI
 
Integrating FME with Python: Tips, Demos, and Best Practices for Powerful Aut...
Integrating FME with Python: Tips, Demos, and Best Practices for Powerful Aut...Integrating FME with Python: Tips, Demos, and Best Practices for Powerful Aut...
Integrating FME with Python: Tips, Demos, and Best Practices for Powerful Aut...
Safe Software
 
Webinar - Top 5 Backup Mistakes MSPs and Businesses Make .pptx
Webinar - Top 5 Backup Mistakes MSPs and Businesses Make   .pptxWebinar - Top 5 Backup Mistakes MSPs and Businesses Make   .pptx
Webinar - Top 5 Backup Mistakes MSPs and Businesses Make .pptx
MSP360
 
GDG Cloud Southlake #42: Suresh Mathew: Autonomous Resource Optimization: How...
GDG Cloud Southlake #42: Suresh Mathew: Autonomous Resource Optimization: How...GDG Cloud Southlake #42: Suresh Mathew: Autonomous Resource Optimization: How...
GDG Cloud Southlake #42: Suresh Mathew: Autonomous Resource Optimization: How...
James Anderson
 
AI 3-in-1: Agents, RAG, and Local Models - Brent Laster
AI 3-in-1: Agents, RAG, and Local Models - Brent LasterAI 3-in-1: Agents, RAG, and Local Models - Brent Laster
AI 3-in-1: Agents, RAG, and Local Models - Brent Laster
All Things Open
 
Q1 2025 Dropbox Earnings and Investor Presentation
Q1 2025 Dropbox Earnings and Investor PresentationQ1 2025 Dropbox Earnings and Investor Presentation
Q1 2025 Dropbox Earnings and Investor Presentation
Dropbox
 

HBase Sizing Notes

  • 1. HBase Sizing Notes Lars George Director EMEA Services @ Cloudera lars@cloudera.com Saturday, June 30, 12
  • 2. Competing Resources • Reads and Writes compete for the same low-level resources ‣ Disk (HDFS) and Network I/O ‣ RPC Handlers and Threads • Otherwise the do exercise completely separate code paths Saturday, June 30, 12
  • 3. Memory Sharing • By default every region server is dividing its memory (i.e. given maximum heap) into ‣ 40% for in-memory stores (write ops) ‣ 20% for block caching (reads ops) ‣ remaining space (here 40%) go towards usual Java heap usage (objects etc.) • Share of memory needs to be tweaked Saturday, June 30, 12
  • 4. Reads • Locate and route request to appropriate region server ‣ Client caches information for faster lookups ➜ consider prefetching option for fast warmups • Eliminate store files if possible using time ranges or Bloom filter • Try block cache, if block is missing then load from disk Saturday, June 30, 12
  • 5. Block Cache • Use exported metrics to see effectiveness of block cache ‣ Check fill and eviction rate, as well as hit ratios ➜ random reads are not ideal • Tweak up or down as needed, but watch overall heap usage • You absolutely need the block cache ‣ Set to 10% at least for short term benefits Saturday, June 30, 12
  • 6. Writes • The cluster size is often determined by the write performance • Log structured merge trees like ‣ Store mutation in in-memory store and write-ahead log ‣ Flush out aggregated, sorted maps at specified threshold - or - when under pressure ‣ Discard logs with no pending edits ‣ Perform regular compactions of store files Saturday, June 30, 12
  • 7. Write Performance • There are many factors to the overall write performance of a cluster ‣ Key Distribution ➜ Avoid region hotspot ‣ Handlers ➜ Do not pile up too early ‣ Write-ahead log ➜ Bottleneck #1 ‣ Compactions ➜ Badly tuned can cause ever increasing background noise Saturday, June 30, 12
  • 8. Write-Ahead Log • Currently only one per region server ‣ Shared across all stores (i.e. column families) ‣ Synchronized on file append calls • Work being done on mitigating this ‣ WAL Compression ‣ Multiple WAL’s per region server ➜ Start more than one region server per node? Saturday, June 30, 12
  • 9. Write-Ahead Log (cont.) • Size set to 95% of default block size ‣ 64MB or 128MB, but check config! • Keep number low to reduce recovery time ‣ Limit set to 32, but can be increased • Increase size of logs - and/or - increase the number of logs before blocking • Compute number based on fill distribution and flush frequencies Saturday, June 30, 12
  • 10. Write-Ahead Log (cont.) • Writes are synchronized across all stores ‣ A large cell in one family can stop all writes of another ‣ In this case the RPC handlers go binary, i.e. either work or all block • Can be bypassed on writes, but means no real durability and no replication ‣ Maybe use coprocessor to restore dependent data sets (preWALRestore) Saturday, June 30, 12
  • 11. Flushes • Every mutation call (put, delete etc.) causes a check for a flush • If threshold is met, flush file to disk and schedule a compaction ‣ Try to compact newly flushed files quickly • The compaction returns - if necessary - where a region should be split Saturday, June 30, 12
  • 12. Compaction Storms • Premature flushing because of # of logs or memory pressure ‣ Files will be smaller than the configured flush size • The background compactions are hard at work merging small flush files into the existing, larger store files ‣ Rewrite hundreds of MB over and over Saturday, June 30, 12
  • 13. Dependencies • Flushes happen across all stores/column families, even if just one triggers it • The flush size is compared to the size of all stores combined ‣ Many column families dilute the size ‣ Example: 55MB + 5MB + 4MB Saturday, June 30, 12
  • 14. Some Numbers • Typical write performance of HDFS is 35-50MB/s Cell Size OPS 0.5MB 70-100 100KB 350-500 10KB 3500-5000 ?? 1KB 35000-50000 ???? This is way to high in practice - Contention! Saturday, June 30, 12
  • 15. Some More Numbers • Under real world conditions the rate is less, more like 15MB/s or less ‣ Thread contention is cause for massive slow down Cell Size OPS 0.5MB 10 100KB 100 10KB 800 1KB 6000 Saturday, June 30, 12
  • 16. Notes • Compute memstore sizes based on number of regions x flush size • Compute number of logs to keep based on fill and flush rate • Ultimately the capacity is driven by ‣ Java Heap ‣ Region Count and Size ‣ Key Distribution Saturday, June 30, 12
  • 17. Cheat Sheet #1 • Ensure you have enough or large enough write-ahead logs • Ensure you do not oversubscribe available memstore space • Ensure to set flush size large enough but not too large • Check write-ahead log usage carefully Saturday, June 30, 12
  • 18. Cheat Sheet #2 • Enable compression to store more data per node • Tweak compaction algorithm to peg background I/O at some level • Consider putting uneven column families in separate tables • Check metrics carefully for block cache, memstore, and all queues Saturday, June 30, 12
  • 19. Example • Java Xmx heap at 10GB • Memstore share at 40% (default) ‣ 10GB Heap x 0.4 = 4GB • Desired flush size at 128MB ‣ 4GB / 128MB = 32 regions max! • For WAL size of 128MB x 0.95% ‣ 4GB / (128MB x 0.95) = ~33 partially uncommitted logs to keep around • Region size at 20GB ‣ 20GB x 32 regions = 640GB raw storage used Saturday, June 30, 12
  翻译: