SlideShare a Scribd company logo
openTSDB - Metrics for
a distributed world
Oliver Hankeln / gutefrage.net
@mydalon

Mittwoch, 30. Oktober 13
Who am I?
Senior Engineer - Data and Infrastructure at
gutefrage.net GmbH
Was doing software development before
DevOps advocate

Mittwoch, 30. Oktober 13
Who is Gutefrage.net?
Germany‘s biggest Q&A platform
#1 German site (mobile) about 5M Unique Users
#3 German site (desktop) about 17M Unique Users
> 4 Mio PI/day

Part of the Holtzbrinck group
Running several platforms (Gutefrage.net,
Helpster.de, Cosmiq, Comprano, ...)

Mittwoch, 30. Oktober 13
What you will get
Why we chose openTSDB
What is openTSDB?
How does openTSDB store the data?
Our experiences
Some advice

Mittwoch, 30. Oktober 13
Why we chose
openTSDB

Mittwoch, 30. Oktober 13
We were looking at
some options
Munin

Graphite openTSDB

Ganglia

Scales
well

no

sort of

yes

yes

Keeps all
data

no

no

yes

no

Creating
metrics

easy

easy

easy

easy

Mittwoch, 30. Oktober 13
We have a winner!
Graphite openTSDB

Scales
well

no

sort of

Keeps all
data

no

no

Creating
metrics

easy

easy

Mittwoch, 30. Oktober 13

Bingo!

Munin

Ganglia

yes

yes

yes

no

easy

easy
Separation of concerns

Mittwoch, 30. Oktober 13
Separation of concerns
$ unzip|strip|touch|finger|grep|mount|fsck|more|yes|
fsck|fsck|fsck|umount|sleep

UI was not important for our decision
Alerting is not what we are looking for in
our time series data base

Mittwoch, 30. Oktober 13
The ecosystem
App feeds metrics in via RabbitMQ
We base Icinga checks on the metrics
We evaluate Skyline and Oculus by Etsy for
anomaly detection
We deploy sensors via chef

Mittwoch, 30. Oktober 13
openTSDB
Written by Benoît Sigoure at StumbleUpon
OpenSource (get it from github)
Uses HBase (which is based on HDFS) as a
storage
Distributed system (multiple TSDs)

Mittwoch, 30. Oktober 13
The big picture
UI
tcollector
API

Mittwoch, 30. Oktober 13

TSD
TSD
TSD
TSD

This is really
a cluster
HBase
Putting data into
openTSDB
$ telnet tsd01.acme.com 4242
put proc.load.avg5min 1382536472 23.2 host=db01.acme.com

Mittwoch, 30. Oktober 13
It gets even better
tcollector is a python script that runs your
collectors
handles network connection, starts your
collectors at set intervals
does basic process management
adds host tag, does deduplication

Mittwoch, 30. Oktober 13
A simple tcollector script
#!/usr/bin/php
<?php
#Cast a die
$die = rand(1,6);
echo "roll.a.d6 " . time() . " " . $die . "n";

Mittwoch, 30. Oktober 13
What was that HDFS
again?
HDFS is a distributed filesystem suitable for
Petabytes of data on thousands of machines.
Runs on commodity hardware
Takes care of redundancy
Used by e.g. Facebook, Spotify, eBay,...

Mittwoch, 30. Oktober 13
Okay... and HBase?
HBase is a NoSQL database / data store on
top of HDFS
Modeled after Google‘s BigTable
Built for big tables (billions of rows, millions
of columns)
Automatic sharding by row key

Mittwoch, 30. Oktober 13
How openTSDB stores
the data

Mittwoch, 30. Oktober 13
Keys are key!
Data is sharded across regions based on
their row key
You query data based on the row key
You can query row key ranges (say e.g. A...D)
So: think about key design

Mittwoch, 30. Oktober 13
Take 1
Row key format: timestamp, metric id

Mittwoch, 30. Oktober 13
Take 1
Row key format: timestamp, metric id
1382536472, 5

17

Server A

Server B

Mittwoch, 30. Oktober 13
Take 1
Row key format: timestamp, metric id
1382536472, 5
1382536472, 6

17
24

Server A

Server B

Mittwoch, 30. Oktober 13
Take 1
Row key format: timestamp, metric id
1382536472, 5
1382536472, 6
1382536472, 8
1382536473, 5
1382536473, 6
1382536473, 8

Mittwoch, 30. Oktober 13

17
24
12
134
10
99

Server A

Server B
Take 1
Row key format: timestamp, metric id
1382536472, 5
1382536472, 6
1382536472, 8
1382536473, 5
1382536473, 6
1382536473, 8
1382536474, 5
1382536474, 6

Mittwoch, 30. Oktober 13

17
24
12
134
10
99
12
42

Server A

Server B
Solution: Swap
timestamp and metric id
Row key format: metric id, timestamp

5, 1382536472
6, 1382536472
8, 1382536472
5, 1382536473
6, 1382536473
8, 1382536473
5, 1382536474
6, 1382536474

Mittwoch, 30. Oktober 13

17
24
12
134
10
99
12
42

Server A

Server B
Solution: Swap
timestamp and metric id
Row key format: metric id, timestamp

5, 1382536472
6, 1382536472
8, 1382536472
5, 1382536473
6, 1382536473
8, 1382536473
5, 1382536474
6, 1382536474

Mittwoch, 30. Oktober 13

17
24
12
134
10
99
12
42

Server A

Server B
Take 2
Metric ID first, then timestamp
Searching through many rows is slower than
searching through viewer rows. (Obviously)
So: Put multiple data points into one row

Mittwoch, 30. Oktober 13
Take 2 continued

5, 1382608800

5, 1382612400

Mittwoch, 30. Oktober 13

+23 +35 +94 +142
17
1
23 42
+13 +25 +88 +89
3

44

12

2
Take 2 continued
Row key
5, 1382608800

5, 1382612400

Mittwoch, 30. Oktober 13

+23 +35 +94 +142
17
1
23 42
+13 +25 +88 +89
3

44

12

2
Take 2 continued
Cell Name
Row key
5, 1382608800

5, 1382612400

Mittwoch, 30. Oktober 13

+23 +35 +94 +142
17
1
23 42
+13 +25 +88 +89
3

44

12

2
Take 2 continued
Cell Name
Row key
5, 1382608800

5, 1382612400

Mittwoch, 30. Oktober 13

Data point

+23 +35 +94 +142
17
1
23 42
+13 +25 +88 +89
3

44

12

2
Where are the tags
stored?
They are put at the end of the row key
Both tag names and tag values are
represented by IDs

Mittwoch, 30. Oktober 13
The Row Key
3 Bytes - metric ID
4 Bytes - timestamp (rounded down to the
hour)
3 Bytes tag ID
3 Bytes tag value ID
Total: 7 Bytes + 6 Bytes * Number of tags

Mittwoch, 30. Oktober 13
Let‘s look at some
graphs

Mittwoch, 30. Oktober 13
Busting some Myths

Mittwoch, 30. Oktober 13
Myth: Keeping Data is
expensive
Gartner found the price for enterprise SSDs
at 1$/GB in 2013
A data point gets compressed to 2-3 Bytes
A metric that you measure every second
then uses disk space for 18.9ct per year.
Usually it is even cheaper

Mittwoch, 30. Oktober 13
If your work costs 50$ per hour and it
takes you only one minute to think about
and configure your RRD compaction
setting, you could have collected that
metric on a second-by-second basis for

4.4 YEARS
instead.

Mittwoch, 30. Oktober 13
Myth: the amount of
metrics is too limited
Don‘t confuse Graphite metric count with
openTSBD metric count.
3 Bytes of metric ID = 16.7M possibilities
3 Bytes tag value ID = 16.7M possibilities
=> at least 280 T metrics (graphite counting)

Mittwoch, 30. Oktober 13
Cultural issues

Mittwoch, 30. Oktober 13
Tools shape culture
shapes tools
It is time for a new monitoring culture!
Embrace machine learning!
Monitor everything in your organisation!
Throw of the shackles of fixed intervals!
Come, join the revolution!

Mittwoch, 30. Oktober 13
Our experiences

Mittwoch, 30. Oktober 13
What works well
We store about 200M data points in several
thousand time series with no issues
tcollector is decoupling measurement from
storage
Creating new metrics is really easy
You are free to choose your rhythm

Mittwoch, 30. Oktober 13
Challenges
The UI is seriously lacking
no annotation support out of the box
no meta data for time series
Only 1s time resolution (and only 1 value/s/
time series)

Mittwoch, 30. Oktober 13
salvation is coming
OpenTSDB 2 is around the corner
millisecond precision
annotations and meta data
improved API
improved UI

Mittwoch, 30. Oktober 13
Friendly advice
Pick a naming scheme and stick to it
Use tags wisely (not more than 6 or 7 tags
per data point)
Use tcollector
wait for openTSDB 2 ;-)

Mittwoch, 30. Oktober 13
Questions?
Please contact me:
oliver.hankeln@gutefrage.net
@mydalon
I‘ll upload the slides and tweet about it

Mittwoch, 30. Oktober 13
Ad

More Related Content

What's hot (20)

FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
Rob Skillington
 
Go and Uber’s time series database m3
Go and Uber’s time series database m3Go and Uber’s time series database m3
Go and Uber’s time series database m3
Rob Skillington
 
Time Series Data in a Time Series World
Time Series Data in a Time Series WorldTime Series Data in a Time Series World
Time Series Data in a Time Series World
MapR Technologies
 
Apache Solr as a compressed, scalable, and high performance time series database
Apache Solr as a compressed, scalable, and high performance time series databaseApache Solr as a compressed, scalable, and high performance time series database
Apache Solr as a compressed, scalable, and high performance time series database
Florian Lautenschlager
 
Pain points with M3, some things to address them and how replication works
Pain points with M3, some things to address them and how replication worksPain points with M3, some things to address them and how replication works
Pain points with M3, some things to address them and how replication works
Rob Skillington
 
FOSDEM 2020: Querying over millions and billions of metrics with M3DB's index
FOSDEM 2020: Querying over millions and billions of metrics with M3DB's indexFOSDEM 2020: Querying over millions and billions of metrics with M3DB's index
FOSDEM 2020: Querying over millions and billions of metrics with M3DB's index
Rob Skillington
 
Advanced Apache Cassandra Operations with JMX
Advanced Apache Cassandra Operations with JMXAdvanced Apache Cassandra Operations with JMX
Advanced Apache Cassandra Operations with JMX
zznate
 
The new time series kid on the block
The new time series kid on the blockThe new time series kid on the block
The new time series kid on the block
Florian Lautenschlager
 
Chronix Poster for the Poster Session FAST 2017
Chronix Poster for the Poster Session FAST 2017Chronix Poster for the Poster Session FAST 2017
Chronix Poster for the Poster Session FAST 2017
Florian Lautenschlager
 
A Fast and Efficient Time Series Storage Based on Apache Solr
A Fast and Efficient Time Series Storage Based on Apache SolrA Fast and Efficient Time Series Storage Based on Apache Solr
A Fast and Efficient Time Series Storage Based on Apache Solr
QAware GmbH
 
Efficient and Fast Time Series Storage - The missing link in dynamic software...
Efficient and Fast Time Series Storage - The missing link in dynamic software...Efficient and Fast Time Series Storage - The missing link in dynamic software...
Efficient and Fast Time Series Storage - The missing link in dynamic software...
Florian Lautenschlager
 
Performance evaluation of apache tajo
Performance evaluation of apache tajoPerformance evaluation of apache tajo
Performance evaluation of apache tajo
Jihoon Son
 
2013 05 ny
2013 05 ny2013 05 ny
2013 05 ny
Sri Ambati
 
Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP
Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSPDiscretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP
Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP
Tathagata Das
 
Data Structures and Performance for Scientific Computing with Hadoop and Dumb...
Data Structures and Performance for Scientific Computing with Hadoop and Dumb...Data Structures and Performance for Scientific Computing with Hadoop and Dumb...
Data Structures and Performance for Scientific Computing with Hadoop and Dumb...
Austin Benson
 
OSDC 2016 - Chronix - A fast and efficient time series storage based on Apach...
OSDC 2016 - Chronix - A fast and efficient time series storage based on Apach...OSDC 2016 - Chronix - A fast and efficient time series storage based on Apach...
OSDC 2016 - Chronix - A fast and efficient time series storage based on Apach...
NETWAYS
 
ToroDB: scaling PostgreSQL like MongoDB / Álvaro Hernández Tortosa (8Kdata)
ToroDB: scaling PostgreSQL like MongoDB / Álvaro Hernández Tortosa (8Kdata)ToroDB: scaling PostgreSQL like MongoDB / Álvaro Hernández Tortosa (8Kdata)
ToroDB: scaling PostgreSQL like MongoDB / Álvaro Hernández Tortosa (8Kdata)
Ontico
 
Query optimization in Apache Tajo
Query optimization in Apache TajoQuery optimization in Apache Tajo
Query optimization in Apache Tajo
Jihoon Son
 
Gnocchi v4 (preview)
Gnocchi v4 (preview)Gnocchi v4 (preview)
Gnocchi v4 (preview)
Gordon Chung
 
RDO hangout on gnocchi
RDO hangout on gnocchiRDO hangout on gnocchi
RDO hangout on gnocchi
Eoghan Glynn
 
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
Rob Skillington
 
Go and Uber’s time series database m3
Go and Uber’s time series database m3Go and Uber’s time series database m3
Go and Uber’s time series database m3
Rob Skillington
 
Time Series Data in a Time Series World
Time Series Data in a Time Series WorldTime Series Data in a Time Series World
Time Series Data in a Time Series World
MapR Technologies
 
Apache Solr as a compressed, scalable, and high performance time series database
Apache Solr as a compressed, scalable, and high performance time series databaseApache Solr as a compressed, scalable, and high performance time series database
Apache Solr as a compressed, scalable, and high performance time series database
Florian Lautenschlager
 
Pain points with M3, some things to address them and how replication works
Pain points with M3, some things to address them and how replication worksPain points with M3, some things to address them and how replication works
Pain points with M3, some things to address them and how replication works
Rob Skillington
 
FOSDEM 2020: Querying over millions and billions of metrics with M3DB's index
FOSDEM 2020: Querying over millions and billions of metrics with M3DB's indexFOSDEM 2020: Querying over millions and billions of metrics with M3DB's index
FOSDEM 2020: Querying over millions and billions of metrics with M3DB's index
Rob Skillington
 
Advanced Apache Cassandra Operations with JMX
Advanced Apache Cassandra Operations with JMXAdvanced Apache Cassandra Operations with JMX
Advanced Apache Cassandra Operations with JMX
zznate
 
Chronix Poster for the Poster Session FAST 2017
Chronix Poster for the Poster Session FAST 2017Chronix Poster for the Poster Session FAST 2017
Chronix Poster for the Poster Session FAST 2017
Florian Lautenschlager
 
A Fast and Efficient Time Series Storage Based on Apache Solr
A Fast and Efficient Time Series Storage Based on Apache SolrA Fast and Efficient Time Series Storage Based on Apache Solr
A Fast and Efficient Time Series Storage Based on Apache Solr
QAware GmbH
 
Efficient and Fast Time Series Storage - The missing link in dynamic software...
Efficient and Fast Time Series Storage - The missing link in dynamic software...Efficient and Fast Time Series Storage - The missing link in dynamic software...
Efficient and Fast Time Series Storage - The missing link in dynamic software...
Florian Lautenschlager
 
Performance evaluation of apache tajo
Performance evaluation of apache tajoPerformance evaluation of apache tajo
Performance evaluation of apache tajo
Jihoon Son
 
Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP
Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSPDiscretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP
Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP
Tathagata Das
 
Data Structures and Performance for Scientific Computing with Hadoop and Dumb...
Data Structures and Performance for Scientific Computing with Hadoop and Dumb...Data Structures and Performance for Scientific Computing with Hadoop and Dumb...
Data Structures and Performance for Scientific Computing with Hadoop and Dumb...
Austin Benson
 
OSDC 2016 - Chronix - A fast and efficient time series storage based on Apach...
OSDC 2016 - Chronix - A fast and efficient time series storage based on Apach...OSDC 2016 - Chronix - A fast and efficient time series storage based on Apach...
OSDC 2016 - Chronix - A fast and efficient time series storage based on Apach...
NETWAYS
 
ToroDB: scaling PostgreSQL like MongoDB / Álvaro Hernández Tortosa (8Kdata)
ToroDB: scaling PostgreSQL like MongoDB / Álvaro Hernández Tortosa (8Kdata)ToroDB: scaling PostgreSQL like MongoDB / Álvaro Hernández Tortosa (8Kdata)
ToroDB: scaling PostgreSQL like MongoDB / Álvaro Hernández Tortosa (8Kdata)
Ontico
 
Query optimization in Apache Tajo
Query optimization in Apache TajoQuery optimization in Apache Tajo
Query optimization in Apache Tajo
Jihoon Son
 
Gnocchi v4 (preview)
Gnocchi v4 (preview)Gnocchi v4 (preview)
Gnocchi v4 (preview)
Gordon Chung
 
RDO hangout on gnocchi
RDO hangout on gnocchiRDO hangout on gnocchi
RDO hangout on gnocchi
Eoghan Glynn
 

Viewers also liked (14)

HBaseCon 2015: OpenTSDB and AsyncHBase Update
HBaseCon 2015: OpenTSDB and AsyncHBase UpdateHBaseCon 2015: OpenTSDB and AsyncHBase Update
HBaseCon 2015: OpenTSDB and AsyncHBase Update
HBaseCon
 
Network Monitoring with Icinga
Network Monitoring with IcingaNetwork Monitoring with Icinga
Network Monitoring with Icinga
learjk
 
opentsdb in a real enviroment
opentsdb in a real enviromentopentsdb in a real enviroment
opentsdb in a real enviroment
Chen Robert
 
Monitoring MySQL with OpenTSDB
Monitoring MySQL with OpenTSDBMonitoring MySQL with OpenTSDB
Monitoring MySQL with OpenTSDB
Geoffrey Anderson
 
Update on OpenTSDB and AsyncHBase
Update on OpenTSDB and AsyncHBase Update on OpenTSDB and AsyncHBase
Update on OpenTSDB and AsyncHBase
HBaseCon
 
HBaseCon 2013: OpenTSDB at Box
HBaseCon 2013: OpenTSDB at BoxHBaseCon 2013: OpenTSDB at Box
HBaseCon 2013: OpenTSDB at Box
Cloudera, Inc.
 
Graph everything
Graph everythingGraph everything
Graph everything
Oliver Hankeln
 
Open TSDB Lightning Talk
Open TSDB Lightning TalkOpen TSDB Lightning Talk
Open TSDB Lightning Talk
CloudOps2005
 
Monitoring @ scale over diverse data sources @ PayPal - Druid, TSDB, Hadoop
Monitoring @ scale over diverse data sources @ PayPal  - Druid, TSDB, HadoopMonitoring @ scale over diverse data sources @ PayPal  - Druid, TSDB, Hadoop
Monitoring @ scale over diverse data sources @ PayPal - Druid, TSDB, Hadoop
Senthil Pandurangan
 
[FR] Timeseries appliqué aux couches de bébé
[FR] Timeseries appliqué aux couches de bébé[FR] Timeseries appliqué aux couches de bébé
[FR] Timeseries appliqué aux couches de bébé
OVHcloud
 
時系列の世界の時系列データ
時系列の世界の時系列データ時系列の世界の時系列データ
時系列の世界の時系列データ
MapR Technologies Japan
 
Argus Production Monitoring at Salesforce
Argus Production Monitoring at SalesforceArgus Production Monitoring at Salesforce
Argus Production Monitoring at Salesforce
HBaseCon
 
Apache Phoenix: Use Cases and New Features
Apache Phoenix: Use Cases and New FeaturesApache Phoenix: Use Cases and New Features
Apache Phoenix: Use Cases and New Features
HBaseCon
 
HBaseCon 2015: OpenTSDB and AsyncHBase Update
HBaseCon 2015: OpenTSDB and AsyncHBase UpdateHBaseCon 2015: OpenTSDB and AsyncHBase Update
HBaseCon 2015: OpenTSDB and AsyncHBase Update
HBaseCon
 
Network Monitoring with Icinga
Network Monitoring with IcingaNetwork Monitoring with Icinga
Network Monitoring with Icinga
learjk
 
opentsdb in a real enviroment
opentsdb in a real enviromentopentsdb in a real enviroment
opentsdb in a real enviroment
Chen Robert
 
Monitoring MySQL with OpenTSDB
Monitoring MySQL with OpenTSDBMonitoring MySQL with OpenTSDB
Monitoring MySQL with OpenTSDB
Geoffrey Anderson
 
Update on OpenTSDB and AsyncHBase
Update on OpenTSDB and AsyncHBase Update on OpenTSDB and AsyncHBase
Update on OpenTSDB and AsyncHBase
HBaseCon
 
HBaseCon 2013: OpenTSDB at Box
HBaseCon 2013: OpenTSDB at BoxHBaseCon 2013: OpenTSDB at Box
HBaseCon 2013: OpenTSDB at Box
Cloudera, Inc.
 
Open TSDB Lightning Talk
Open TSDB Lightning TalkOpen TSDB Lightning Talk
Open TSDB Lightning Talk
CloudOps2005
 
Monitoring @ scale over diverse data sources @ PayPal - Druid, TSDB, Hadoop
Monitoring @ scale over diverse data sources @ PayPal  - Druid, TSDB, HadoopMonitoring @ scale over diverse data sources @ PayPal  - Druid, TSDB, Hadoop
Monitoring @ scale over diverse data sources @ PayPal - Druid, TSDB, Hadoop
Senthil Pandurangan
 
[FR] Timeseries appliqué aux couches de bébé
[FR] Timeseries appliqué aux couches de bébé[FR] Timeseries appliqué aux couches de bébé
[FR] Timeseries appliqué aux couches de bébé
OVHcloud
 
Argus Production Monitoring at Salesforce
Argus Production Monitoring at SalesforceArgus Production Monitoring at Salesforce
Argus Production Monitoring at Salesforce
HBaseCon
 
Apache Phoenix: Use Cases and New Features
Apache Phoenix: Use Cases and New FeaturesApache Phoenix: Use Cases and New Features
Apache Phoenix: Use Cases and New Features
HBaseCon
 
Ad

Similar to openTSDB - Metrics for a distributed world (20)

OSMC 2013 | openTSDB - metrics for a distributed world
OSMC 2013 | openTSDB - metrics for a distributed worldOSMC 2013 | openTSDB - metrics for a distributed world
OSMC 2013 | openTSDB - metrics for a distributed world
NETWAYS
 
Real Time Analytics with Apache Cassandra - Cassandra Day Munich
Real Time Analytics with Apache Cassandra - Cassandra Day MunichReal Time Analytics with Apache Cassandra - Cassandra Day Munich
Real Time Analytics with Apache Cassandra - Cassandra Day Munich
Guido Schmutz
 
Kusto (Azure Data Explorer) Training for R&D - January 2019
Kusto (Azure Data Explorer) Training for R&D - January 2019 Kusto (Azure Data Explorer) Training for R&D - January 2019
Kusto (Azure Data Explorer) Training for R&D - January 2019
Tal Bar-Zvi
 
Mapreduce Algorithms
Mapreduce AlgorithmsMapreduce Algorithms
Mapreduce Algorithms
Amund Tveit
 
Cassandra sf meetup_2013_07_31
Cassandra sf meetup_2013_07_31Cassandra sf meetup_2013_07_31
Cassandra sf meetup_2013_07_31
George Courtsunis
 
Cassandra at Disqus — SF Cassandra Users Group July 31st
Cassandra at Disqus — SF Cassandra Users Group July 31stCassandra at Disqus — SF Cassandra Users Group July 31st
Cassandra at Disqus — SF Cassandra Users Group July 31st
DataStax Academy
 
Bio-IT Asia 2013: Informatics & Cloud - Best Practices & Lessons Learned
Bio-IT Asia 2013: Informatics & Cloud - Best Practices & Lessons LearnedBio-IT Asia 2013: Informatics & Cloud - Best Practices & Lessons Learned
Bio-IT Asia 2013: Informatics & Cloud - Best Practices & Lessons Learned
Chris Dagdigian
 
noSQL @ QCon SP
noSQL @ QCon SPnoSQL @ QCon SP
noSQL @ QCon SP
Alexandre Porcelli
 
Multi dimensional profiling
Multi dimensional profilingMulti dimensional profiling
Multi dimensional profiling
bergel
 
M|18 Analytics in the Real World, Case Studies and Use Cases
M|18 Analytics in the Real World, Case Studies and Use CasesM|18 Analytics in the Real World, Case Studies and Use Cases
M|18 Analytics in the Real World, Case Studies and Use Cases
MariaDB plc
 
GeeCON Prague 2015
GeeCON Prague 2015GeeCON Prague 2015
GeeCON Prague 2015
Mateusz Dymczyk
 
Real Time Analytics with Apache Cassandra - Cassandra Day Berlin
Real Time Analytics with Apache Cassandra - Cassandra Day BerlinReal Time Analytics with Apache Cassandra - Cassandra Day Berlin
Real Time Analytics with Apache Cassandra - Cassandra Day Berlin
Guido Schmutz
 
Bio-IT & Cloud Sobriety: 2013 Beyond The Genome Meeting
Bio-IT & Cloud Sobriety: 2013 Beyond The Genome MeetingBio-IT & Cloud Sobriety: 2013 Beyond The Genome Meeting
Bio-IT & Cloud Sobriety: 2013 Beyond The Genome Meeting
Chris Dagdigian
 
Microsoft Dryad
Microsoft DryadMicrosoft Dryad
Microsoft Dryad
Colin Clark
 
The R of War
The R of WarThe R of War
The R of War
Kevin Davis
 
Computação Paralela: Benefícios e Desafios - Intel Software Conference 2013
Computação Paralela: Benefícios e Desafios - Intel Software Conference 2013Computação Paralela: Benefícios e Desafios - Intel Software Conference 2013
Computação Paralela: Benefícios e Desafios - Intel Software Conference 2013
Intel Software Brasil
 
Slide1
Slide1Slide1
Slide1
Thiti Sununta
 
A Hacking Toolset for Big Tabular Files (3)
A Hacking Toolset for Big Tabular Files (3)A Hacking Toolset for Big Tabular Files (3)
A Hacking Toolset for Big Tabular Files (3)
Toshiyuki Shimono
 
M|18 Understanding the Architecture of MariaDB ColumnStore
M|18 Understanding the Architecture of MariaDB ColumnStoreM|18 Understanding the Architecture of MariaDB ColumnStore
M|18 Understanding the Architecture of MariaDB ColumnStore
MariaDB plc
 
1- INTRO TO ICT.pptSDAFADFASDFASVSDFVSDFGBSD
1- INTRO TO ICT.pptSDAFADFASDFASVSDFVSDFGBSD1- INTRO TO ICT.pptSDAFADFASDFASVSDFVSDFGBSD
1- INTRO TO ICT.pptSDAFADFASDFASVSDFVSDFGBSD
HafizBilal47
 
OSMC 2013 | openTSDB - metrics for a distributed world
OSMC 2013 | openTSDB - metrics for a distributed worldOSMC 2013 | openTSDB - metrics for a distributed world
OSMC 2013 | openTSDB - metrics for a distributed world
NETWAYS
 
Real Time Analytics with Apache Cassandra - Cassandra Day Munich
Real Time Analytics with Apache Cassandra - Cassandra Day MunichReal Time Analytics with Apache Cassandra - Cassandra Day Munich
Real Time Analytics with Apache Cassandra - Cassandra Day Munich
Guido Schmutz
 
Kusto (Azure Data Explorer) Training for R&D - January 2019
Kusto (Azure Data Explorer) Training for R&D - January 2019 Kusto (Azure Data Explorer) Training for R&D - January 2019
Kusto (Azure Data Explorer) Training for R&D - January 2019
Tal Bar-Zvi
 
Mapreduce Algorithms
Mapreduce AlgorithmsMapreduce Algorithms
Mapreduce Algorithms
Amund Tveit
 
Cassandra sf meetup_2013_07_31
Cassandra sf meetup_2013_07_31Cassandra sf meetup_2013_07_31
Cassandra sf meetup_2013_07_31
George Courtsunis
 
Cassandra at Disqus — SF Cassandra Users Group July 31st
Cassandra at Disqus — SF Cassandra Users Group July 31stCassandra at Disqus — SF Cassandra Users Group July 31st
Cassandra at Disqus — SF Cassandra Users Group July 31st
DataStax Academy
 
Bio-IT Asia 2013: Informatics & Cloud - Best Practices & Lessons Learned
Bio-IT Asia 2013: Informatics & Cloud - Best Practices & Lessons LearnedBio-IT Asia 2013: Informatics & Cloud - Best Practices & Lessons Learned
Bio-IT Asia 2013: Informatics & Cloud - Best Practices & Lessons Learned
Chris Dagdigian
 
Multi dimensional profiling
Multi dimensional profilingMulti dimensional profiling
Multi dimensional profiling
bergel
 
M|18 Analytics in the Real World, Case Studies and Use Cases
M|18 Analytics in the Real World, Case Studies and Use CasesM|18 Analytics in the Real World, Case Studies and Use Cases
M|18 Analytics in the Real World, Case Studies and Use Cases
MariaDB plc
 
Real Time Analytics with Apache Cassandra - Cassandra Day Berlin
Real Time Analytics with Apache Cassandra - Cassandra Day BerlinReal Time Analytics with Apache Cassandra - Cassandra Day Berlin
Real Time Analytics with Apache Cassandra - Cassandra Day Berlin
Guido Schmutz
 
Bio-IT & Cloud Sobriety: 2013 Beyond The Genome Meeting
Bio-IT & Cloud Sobriety: 2013 Beyond The Genome MeetingBio-IT & Cloud Sobriety: 2013 Beyond The Genome Meeting
Bio-IT & Cloud Sobriety: 2013 Beyond The Genome Meeting
Chris Dagdigian
 
Computação Paralela: Benefícios e Desafios - Intel Software Conference 2013
Computação Paralela: Benefícios e Desafios - Intel Software Conference 2013Computação Paralela: Benefícios e Desafios - Intel Software Conference 2013
Computação Paralela: Benefícios e Desafios - Intel Software Conference 2013
Intel Software Brasil
 
A Hacking Toolset for Big Tabular Files (3)
A Hacking Toolset for Big Tabular Files (3)A Hacking Toolset for Big Tabular Files (3)
A Hacking Toolset for Big Tabular Files (3)
Toshiyuki Shimono
 
M|18 Understanding the Architecture of MariaDB ColumnStore
M|18 Understanding the Architecture of MariaDB ColumnStoreM|18 Understanding the Architecture of MariaDB ColumnStore
M|18 Understanding the Architecture of MariaDB ColumnStore
MariaDB plc
 
1- INTRO TO ICT.pptSDAFADFASDFASVSDFVSDFGBSD
1- INTRO TO ICT.pptSDAFADFASDFASVSDFVSDFGBSD1- INTRO TO ICT.pptSDAFADFASDFASVSDFVSDFGBSD
1- INTRO TO ICT.pptSDAFADFASDFASVSDFVSDFGBSD
HafizBilal47
 
Ad

Recently uploaded (20)

Kit-Works Team Study_팀스터디_김한솔_nuqs_20250509.pdf
Kit-Works Team Study_팀스터디_김한솔_nuqs_20250509.pdfKit-Works Team Study_팀스터디_김한솔_nuqs_20250509.pdf
Kit-Works Team Study_팀스터디_김한솔_nuqs_20250509.pdf
Wonjun Hwang
 
Viam product demo_ Deploying and scaling AI with hardware.pdf
Viam product demo_ Deploying and scaling AI with hardware.pdfViam product demo_ Deploying and scaling AI with hardware.pdf
Viam product demo_ Deploying and scaling AI with hardware.pdf
camilalamoratta
 
IT484 Cyber Forensics_Information Technology
IT484 Cyber Forensics_Information TechnologyIT484 Cyber Forensics_Information Technology
IT484 Cyber Forensics_Information Technology
SHEHABALYAMANI
 
RTP Over QUIC: An Interesting Opportunity Or Wasted Time?
RTP Over QUIC: An Interesting Opportunity Or Wasted Time?RTP Over QUIC: An Interesting Opportunity Or Wasted Time?
RTP Over QUIC: An Interesting Opportunity Or Wasted Time?
Lorenzo Miniero
 
Limecraft Webinar - 2025.3 release, featuring Content Delivery, Graphic Conte...
Limecraft Webinar - 2025.3 release, featuring Content Delivery, Graphic Conte...Limecraft Webinar - 2025.3 release, featuring Content Delivery, Graphic Conte...
Limecraft Webinar - 2025.3 release, featuring Content Delivery, Graphic Conte...
Maarten Verwaest
 
Enterprise Integration Is Dead! Long Live AI-Driven Integration with Apache C...
Enterprise Integration Is Dead! Long Live AI-Driven Integration with Apache C...Enterprise Integration Is Dead! Long Live AI-Driven Integration with Apache C...
Enterprise Integration Is Dead! Long Live AI-Driven Integration with Apache C...
Markus Eisele
 
Top-AI-Based-Tools-for-Game-Developers (1).pptx
Top-AI-Based-Tools-for-Game-Developers (1).pptxTop-AI-Based-Tools-for-Game-Developers (1).pptx
Top-AI-Based-Tools-for-Game-Developers (1).pptx
BR Softech
 
Zilliz Cloud Monthly Technical Review: May 2025
Zilliz Cloud Monthly Technical Review: May 2025Zilliz Cloud Monthly Technical Review: May 2025
Zilliz Cloud Monthly Technical Review: May 2025
Zilliz
 
Building the Customer Identity Community, Together.pdf
Building the Customer Identity Community, Together.pdfBuilding the Customer Identity Community, Together.pdf
Building the Customer Identity Community, Together.pdf
Cheryl Hung
 
IT488 Wireless Sensor Networks_Information Technology
IT488 Wireless Sensor Networks_Information TechnologyIT488 Wireless Sensor Networks_Information Technology
IT488 Wireless Sensor Networks_Information Technology
SHEHABALYAMANI
 
Reimagine How You and Your Team Work with Microsoft 365 Copilot.pptx
Reimagine How You and Your Team Work with Microsoft 365 Copilot.pptxReimagine How You and Your Team Work with Microsoft 365 Copilot.pptx
Reimagine How You and Your Team Work with Microsoft 365 Copilot.pptx
John Moore
 
GDG Cloud Southlake #42: Suresh Mathew: Autonomous Resource Optimization: How...
GDG Cloud Southlake #42: Suresh Mathew: Autonomous Resource Optimization: How...GDG Cloud Southlake #42: Suresh Mathew: Autonomous Resource Optimization: How...
GDG Cloud Southlake #42: Suresh Mathew: Autonomous Resource Optimization: How...
James Anderson
 
The No-Code Way to Build a Marketing Team with One AI Agent (Download the n8n...
The No-Code Way to Build a Marketing Team with One AI Agent (Download the n8n...The No-Code Way to Build a Marketing Team with One AI Agent (Download the n8n...
The No-Code Way to Build a Marketing Team with One AI Agent (Download the n8n...
SOFTTECHHUB
 
Smart Investments Leveraging Agentic AI for Real Estate Success.pptx
Smart Investments Leveraging Agentic AI for Real Estate Success.pptxSmart Investments Leveraging Agentic AI for Real Estate Success.pptx
Smart Investments Leveraging Agentic AI for Real Estate Success.pptx
Seasia Infotech
 
How to Install & Activate ListGrabber - eGrabber
How to Install & Activate ListGrabber - eGrabberHow to Install & Activate ListGrabber - eGrabber
How to Install & Activate ListGrabber - eGrabber
eGrabber
 
AI x Accessibility UXPA by Stew Smith and Olivier Vroom
AI x Accessibility UXPA by Stew Smith and Olivier VroomAI x Accessibility UXPA by Stew Smith and Olivier Vroom
AI x Accessibility UXPA by Stew Smith and Olivier Vroom
UXPA Boston
 
DevOpsDays SLC - Platform Engineers are Product Managers.pptx
DevOpsDays SLC - Platform Engineers are Product Managers.pptxDevOpsDays SLC - Platform Engineers are Product Managers.pptx
DevOpsDays SLC - Platform Engineers are Product Managers.pptx
Justin Reock
 
Crazy Incentives and How They Kill Security. How Do You Turn the Wheel?
Crazy Incentives and How They Kill Security. How Do You Turn the Wheel?Crazy Incentives and How They Kill Security. How Do You Turn the Wheel?
Crazy Incentives and How They Kill Security. How Do You Turn the Wheel?
Christian Folini
 
Design pattern talk by Kaya Weers - 2025 (v2)
Design pattern talk by Kaya Weers - 2025 (v2)Design pattern talk by Kaya Weers - 2025 (v2)
Design pattern talk by Kaya Weers - 2025 (v2)
Kaya Weers
 
Cybersecurity Threat Vectors and Mitigation
Cybersecurity Threat Vectors and MitigationCybersecurity Threat Vectors and Mitigation
Cybersecurity Threat Vectors and Mitigation
VICTOR MAESTRE RAMIREZ
 
Kit-Works Team Study_팀스터디_김한솔_nuqs_20250509.pdf
Kit-Works Team Study_팀스터디_김한솔_nuqs_20250509.pdfKit-Works Team Study_팀스터디_김한솔_nuqs_20250509.pdf
Kit-Works Team Study_팀스터디_김한솔_nuqs_20250509.pdf
Wonjun Hwang
 
Viam product demo_ Deploying and scaling AI with hardware.pdf
Viam product demo_ Deploying and scaling AI with hardware.pdfViam product demo_ Deploying and scaling AI with hardware.pdf
Viam product demo_ Deploying and scaling AI with hardware.pdf
camilalamoratta
 
IT484 Cyber Forensics_Information Technology
IT484 Cyber Forensics_Information TechnologyIT484 Cyber Forensics_Information Technology
IT484 Cyber Forensics_Information Technology
SHEHABALYAMANI
 
RTP Over QUIC: An Interesting Opportunity Or Wasted Time?
RTP Over QUIC: An Interesting Opportunity Or Wasted Time?RTP Over QUIC: An Interesting Opportunity Or Wasted Time?
RTP Over QUIC: An Interesting Opportunity Or Wasted Time?
Lorenzo Miniero
 
Limecraft Webinar - 2025.3 release, featuring Content Delivery, Graphic Conte...
Limecraft Webinar - 2025.3 release, featuring Content Delivery, Graphic Conte...Limecraft Webinar - 2025.3 release, featuring Content Delivery, Graphic Conte...
Limecraft Webinar - 2025.3 release, featuring Content Delivery, Graphic Conte...
Maarten Verwaest
 
Enterprise Integration Is Dead! Long Live AI-Driven Integration with Apache C...
Enterprise Integration Is Dead! Long Live AI-Driven Integration with Apache C...Enterprise Integration Is Dead! Long Live AI-Driven Integration with Apache C...
Enterprise Integration Is Dead! Long Live AI-Driven Integration with Apache C...
Markus Eisele
 
Top-AI-Based-Tools-for-Game-Developers (1).pptx
Top-AI-Based-Tools-for-Game-Developers (1).pptxTop-AI-Based-Tools-for-Game-Developers (1).pptx
Top-AI-Based-Tools-for-Game-Developers (1).pptx
BR Softech
 
Zilliz Cloud Monthly Technical Review: May 2025
Zilliz Cloud Monthly Technical Review: May 2025Zilliz Cloud Monthly Technical Review: May 2025
Zilliz Cloud Monthly Technical Review: May 2025
Zilliz
 
Building the Customer Identity Community, Together.pdf
Building the Customer Identity Community, Together.pdfBuilding the Customer Identity Community, Together.pdf
Building the Customer Identity Community, Together.pdf
Cheryl Hung
 
IT488 Wireless Sensor Networks_Information Technology
IT488 Wireless Sensor Networks_Information TechnologyIT488 Wireless Sensor Networks_Information Technology
IT488 Wireless Sensor Networks_Information Technology
SHEHABALYAMANI
 
Reimagine How You and Your Team Work with Microsoft 365 Copilot.pptx
Reimagine How You and Your Team Work with Microsoft 365 Copilot.pptxReimagine How You and Your Team Work with Microsoft 365 Copilot.pptx
Reimagine How You and Your Team Work with Microsoft 365 Copilot.pptx
John Moore
 
GDG Cloud Southlake #42: Suresh Mathew: Autonomous Resource Optimization: How...
GDG Cloud Southlake #42: Suresh Mathew: Autonomous Resource Optimization: How...GDG Cloud Southlake #42: Suresh Mathew: Autonomous Resource Optimization: How...
GDG Cloud Southlake #42: Suresh Mathew: Autonomous Resource Optimization: How...
James Anderson
 
The No-Code Way to Build a Marketing Team with One AI Agent (Download the n8n...
The No-Code Way to Build a Marketing Team with One AI Agent (Download the n8n...The No-Code Way to Build a Marketing Team with One AI Agent (Download the n8n...
The No-Code Way to Build a Marketing Team with One AI Agent (Download the n8n...
SOFTTECHHUB
 
Smart Investments Leveraging Agentic AI for Real Estate Success.pptx
Smart Investments Leveraging Agentic AI for Real Estate Success.pptxSmart Investments Leveraging Agentic AI for Real Estate Success.pptx
Smart Investments Leveraging Agentic AI for Real Estate Success.pptx
Seasia Infotech
 
How to Install & Activate ListGrabber - eGrabber
How to Install & Activate ListGrabber - eGrabberHow to Install & Activate ListGrabber - eGrabber
How to Install & Activate ListGrabber - eGrabber
eGrabber
 
AI x Accessibility UXPA by Stew Smith and Olivier Vroom
AI x Accessibility UXPA by Stew Smith and Olivier VroomAI x Accessibility UXPA by Stew Smith and Olivier Vroom
AI x Accessibility UXPA by Stew Smith and Olivier Vroom
UXPA Boston
 
DevOpsDays SLC - Platform Engineers are Product Managers.pptx
DevOpsDays SLC - Platform Engineers are Product Managers.pptxDevOpsDays SLC - Platform Engineers are Product Managers.pptx
DevOpsDays SLC - Platform Engineers are Product Managers.pptx
Justin Reock
 
Crazy Incentives and How They Kill Security. How Do You Turn the Wheel?
Crazy Incentives and How They Kill Security. How Do You Turn the Wheel?Crazy Incentives and How They Kill Security. How Do You Turn the Wheel?
Crazy Incentives and How They Kill Security. How Do You Turn the Wheel?
Christian Folini
 
Design pattern talk by Kaya Weers - 2025 (v2)
Design pattern talk by Kaya Weers - 2025 (v2)Design pattern talk by Kaya Weers - 2025 (v2)
Design pattern talk by Kaya Weers - 2025 (v2)
Kaya Weers
 
Cybersecurity Threat Vectors and Mitigation
Cybersecurity Threat Vectors and MitigationCybersecurity Threat Vectors and Mitigation
Cybersecurity Threat Vectors and Mitigation
VICTOR MAESTRE RAMIREZ
 

openTSDB - Metrics for a distributed world

  • 1. openTSDB - Metrics for a distributed world Oliver Hankeln / gutefrage.net @mydalon Mittwoch, 30. Oktober 13
  • 2. Who am I? Senior Engineer - Data and Infrastructure at gutefrage.net GmbH Was doing software development before DevOps advocate Mittwoch, 30. Oktober 13
  • 3. Who is Gutefrage.net? Germany‘s biggest Q&A platform #1 German site (mobile) about 5M Unique Users #3 German site (desktop) about 17M Unique Users > 4 Mio PI/day Part of the Holtzbrinck group Running several platforms (Gutefrage.net, Helpster.de, Cosmiq, Comprano, ...) Mittwoch, 30. Oktober 13
  • 4. What you will get Why we chose openTSDB What is openTSDB? How does openTSDB store the data? Our experiences Some advice Mittwoch, 30. Oktober 13
  • 6. We were looking at some options Munin Graphite openTSDB Ganglia Scales well no sort of yes yes Keeps all data no no yes no Creating metrics easy easy easy easy Mittwoch, 30. Oktober 13
  • 7. We have a winner! Graphite openTSDB Scales well no sort of Keeps all data no no Creating metrics easy easy Mittwoch, 30. Oktober 13 Bingo! Munin Ganglia yes yes yes no easy easy
  • 9. Separation of concerns $ unzip|strip|touch|finger|grep|mount|fsck|more|yes| fsck|fsck|fsck|umount|sleep UI was not important for our decision Alerting is not what we are looking for in our time series data base Mittwoch, 30. Oktober 13
  • 10. The ecosystem App feeds metrics in via RabbitMQ We base Icinga checks on the metrics We evaluate Skyline and Oculus by Etsy for anomaly detection We deploy sensors via chef Mittwoch, 30. Oktober 13
  • 11. openTSDB Written by Benoît Sigoure at StumbleUpon OpenSource (get it from github) Uses HBase (which is based on HDFS) as a storage Distributed system (multiple TSDs) Mittwoch, 30. Oktober 13
  • 12. The big picture UI tcollector API Mittwoch, 30. Oktober 13 TSD TSD TSD TSD This is really a cluster HBase
  • 13. Putting data into openTSDB $ telnet tsd01.acme.com 4242 put proc.load.avg5min 1382536472 23.2 host=db01.acme.com Mittwoch, 30. Oktober 13
  • 14. It gets even better tcollector is a python script that runs your collectors handles network connection, starts your collectors at set intervals does basic process management adds host tag, does deduplication Mittwoch, 30. Oktober 13
  • 15. A simple tcollector script #!/usr/bin/php <?php #Cast a die $die = rand(1,6); echo "roll.a.d6 " . time() . " " . $die . "n"; Mittwoch, 30. Oktober 13
  • 16. What was that HDFS again? HDFS is a distributed filesystem suitable for Petabytes of data on thousands of machines. Runs on commodity hardware Takes care of redundancy Used by e.g. Facebook, Spotify, eBay,... Mittwoch, 30. Oktober 13
  • 17. Okay... and HBase? HBase is a NoSQL database / data store on top of HDFS Modeled after Google‘s BigTable Built for big tables (billions of rows, millions of columns) Automatic sharding by row key Mittwoch, 30. Oktober 13
  • 18. How openTSDB stores the data Mittwoch, 30. Oktober 13
  • 19. Keys are key! Data is sharded across regions based on their row key You query data based on the row key You can query row key ranges (say e.g. A...D) So: think about key design Mittwoch, 30. Oktober 13
  • 20. Take 1 Row key format: timestamp, metric id Mittwoch, 30. Oktober 13
  • 21. Take 1 Row key format: timestamp, metric id 1382536472, 5 17 Server A Server B Mittwoch, 30. Oktober 13
  • 22. Take 1 Row key format: timestamp, metric id 1382536472, 5 1382536472, 6 17 24 Server A Server B Mittwoch, 30. Oktober 13
  • 23. Take 1 Row key format: timestamp, metric id 1382536472, 5 1382536472, 6 1382536472, 8 1382536473, 5 1382536473, 6 1382536473, 8 Mittwoch, 30. Oktober 13 17 24 12 134 10 99 Server A Server B
  • 24. Take 1 Row key format: timestamp, metric id 1382536472, 5 1382536472, 6 1382536472, 8 1382536473, 5 1382536473, 6 1382536473, 8 1382536474, 5 1382536474, 6 Mittwoch, 30. Oktober 13 17 24 12 134 10 99 12 42 Server A Server B
  • 25. Solution: Swap timestamp and metric id Row key format: metric id, timestamp 5, 1382536472 6, 1382536472 8, 1382536472 5, 1382536473 6, 1382536473 8, 1382536473 5, 1382536474 6, 1382536474 Mittwoch, 30. Oktober 13 17 24 12 134 10 99 12 42 Server A Server B
  • 26. Solution: Swap timestamp and metric id Row key format: metric id, timestamp 5, 1382536472 6, 1382536472 8, 1382536472 5, 1382536473 6, 1382536473 8, 1382536473 5, 1382536474 6, 1382536474 Mittwoch, 30. Oktober 13 17 24 12 134 10 99 12 42 Server A Server B
  • 27. Take 2 Metric ID first, then timestamp Searching through many rows is slower than searching through viewer rows. (Obviously) So: Put multiple data points into one row Mittwoch, 30. Oktober 13
  • 28. Take 2 continued 5, 1382608800 5, 1382612400 Mittwoch, 30. Oktober 13 +23 +35 +94 +142 17 1 23 42 +13 +25 +88 +89 3 44 12 2
  • 29. Take 2 continued Row key 5, 1382608800 5, 1382612400 Mittwoch, 30. Oktober 13 +23 +35 +94 +142 17 1 23 42 +13 +25 +88 +89 3 44 12 2
  • 30. Take 2 continued Cell Name Row key 5, 1382608800 5, 1382612400 Mittwoch, 30. Oktober 13 +23 +35 +94 +142 17 1 23 42 +13 +25 +88 +89 3 44 12 2
  • 31. Take 2 continued Cell Name Row key 5, 1382608800 5, 1382612400 Mittwoch, 30. Oktober 13 Data point +23 +35 +94 +142 17 1 23 42 +13 +25 +88 +89 3 44 12 2
  • 32. Where are the tags stored? They are put at the end of the row key Both tag names and tag values are represented by IDs Mittwoch, 30. Oktober 13
  • 33. The Row Key 3 Bytes - metric ID 4 Bytes - timestamp (rounded down to the hour) 3 Bytes tag ID 3 Bytes tag value ID Total: 7 Bytes + 6 Bytes * Number of tags Mittwoch, 30. Oktober 13
  • 34. Let‘s look at some graphs Mittwoch, 30. Oktober 13
  • 36. Myth: Keeping Data is expensive Gartner found the price for enterprise SSDs at 1$/GB in 2013 A data point gets compressed to 2-3 Bytes A metric that you measure every second then uses disk space for 18.9ct per year. Usually it is even cheaper Mittwoch, 30. Oktober 13
  • 37. If your work costs 50$ per hour and it takes you only one minute to think about and configure your RRD compaction setting, you could have collected that metric on a second-by-second basis for 4.4 YEARS instead. Mittwoch, 30. Oktober 13
  • 38. Myth: the amount of metrics is too limited Don‘t confuse Graphite metric count with openTSBD metric count. 3 Bytes of metric ID = 16.7M possibilities 3 Bytes tag value ID = 16.7M possibilities => at least 280 T metrics (graphite counting) Mittwoch, 30. Oktober 13
  • 40. Tools shape culture shapes tools It is time for a new monitoring culture! Embrace machine learning! Monitor everything in your organisation! Throw of the shackles of fixed intervals! Come, join the revolution! Mittwoch, 30. Oktober 13
  • 42. What works well We store about 200M data points in several thousand time series with no issues tcollector is decoupling measurement from storage Creating new metrics is really easy You are free to choose your rhythm Mittwoch, 30. Oktober 13
  • 43. Challenges The UI is seriously lacking no annotation support out of the box no meta data for time series Only 1s time resolution (and only 1 value/s/ time series) Mittwoch, 30. Oktober 13
  • 44. salvation is coming OpenTSDB 2 is around the corner millisecond precision annotations and meta data improved API improved UI Mittwoch, 30. Oktober 13
  • 45. Friendly advice Pick a naming scheme and stick to it Use tags wisely (not more than 6 or 7 tags per data point) Use tcollector wait for openTSDB 2 ;-) Mittwoch, 30. Oktober 13
  • 46. Questions? Please contact me: oliver.hankeln@gutefrage.net @mydalon I‘ll upload the slides and tweet about it Mittwoch, 30. Oktober 13
  翻译: