SlideShare a Scribd company logo
Handling realtime and analytic
workloads in a single cluster
with Hadoop and Cassandra
Handling realtime and analytic
workloads in a single cluster
with Hadoop and Cassandra
Piotr Kołaczkowski
pkolaczk@datastax.com
@pkolaczk
Piotr Kołaczkowski
pkolaczk@datastax.com
@pkolaczk
Basic Cassandra + Hadoop Integration
C*
C*
C*
C*
C*
C*
C*
C*
Cassandra
Cluster
Hadoop Cluster
NameNode & JobTracker
DataNode DataNode
DataNode DataNode
DataNode DataNode
CFIF
CFOF
ColumnFamilyInputFormat
jim age: 36 car: camaro gender: M
carol age: 37 car: subaru
johnny age: 12 gender: M
suzy age: 10 gender: F
Key: ByteBuffer
Value: SortedMap<ByteBuffer, IColumn>
(column name, value, timestamp)
row key
column name
ColumnFamilyInputFormat
jim age: 36 car: camaro gender: M
carol age: 37 car: subaru
johnny age: 12 gender: M
suzy age: 10 gender: F
Input Key:
jim
age: 36 car: camaro gender: M
Input Value:
ColumnFamilyInputFormat
jim age: 36 car: camaro gender: M
carol age: 37 car: subaru
johnny age: 12 gender: M
suzy age: 10 gender: F
Input Key:
carol
age: 37 car: subaru
Input Value:
ColumnFamilyInputFormat
jim age: 36 car: camaro gender: M
carol age: 37 car: subaru
johnny age: 12 gender: M
suzy age: 10 gender: F
Input Key:
johnny
age: 12 gender: M
Input Value:
ColumnFamilyInputFormat
jim age: 36 car: camaro gender: M
carol age: 37 car: subaru
johnny age: 12 gender: M
suzy age: 10 gender: F
Input Key:
suzy
age: 10 gender: F
Input Value:
CFIF – Wide Row Support
Input Key:
jim
age: 36
Input Value:
jim age: 36 car: camaro gender: M
carol age: 37 car: subaru
johnny age: 12 gender: M
suzy age: 10 gender: F
CFIF – Wide Row Support
Input Key:
jim
car: camaro
Input Value:
jim age: 36 car: camaro gender: M
carol age: 37 car: subaru
johnny age: 12 gender: M
suzy age: 10 gender: F
CFIF – Wide Row Support
Input Key:
jim
gender: M
Input Value:
jim age: 36 car: camaro gender: M
carol age: 37 car: subaru
johnny age: 12 gender: M
suzy age: 10 gender: F
CFIF – Wide Row Support
Input Key:
carol
age: 37
Input Value:
jim age: 36 car: camaro gender: M
carol age: 37 car: subaru
johnny age: 12 gender: M
suzy age: 10 gender: F
CFIF – Wide Row Support
Input Key:
carol
car: subaru
Input Value:
jim age: 36 car: camaro gender: M
carol age: 37 car: subaru
johnny age: 12 gender: M
suzy age: 10 gender: F
CFIF – Cassandra Secondary Index Support
IndexExpression expr =
new IndexExpression(
ByteBufferUtil.bytes("car"),
IndexOperator.EQ,
ByteBufferUitl.bytes("subaru")
);
ConfigHelper.setInputRange(
job.getConfiguration(),
Arrays.asList(expr)
);
jim age: 36 car: camaro gender: M
carol age: 37 car: subaru
johnny age: 12 gender: M
suzy age: 10 gender: F
ColumnFamilyOutputFormat
● Key: ByteBuffer (row key)
● Value: List<Mutation>
– Mutation: insert or delete a column
C*
C*
C*
C*
C*
C*
C*
C*
Cassandra
Cluster
ColumnFamilyRecordWriter
write
queue
client
thrift
CFOF – Creating Mutations
ByteBuffer rowkey = ByteBufferUtil.bytes(“carol”);
Column column = new Column();
column.name = ByteBufferUtil.bytes(“age”);
column.value = ByteBufferUtil.bytes(37);
List<Mutation> mutations;
Mutation mutation = new Mutation();
mutation.column_or_supercolumn = new ColumnOrSuperColumn();
mutation.column_or_supercolumn.column = column;
mutations.add(mutation);
context.write(rowkey, mutationList);
BulkOutputFormat
Hadoop Temporary Dir
SSTable 1 SSTable 2 SSTable N...
flush
write
BulkRecordWriter
Memory Buffer
DataStax Enterprise:
Cassandra and Hadoop in a Single Cluster
Basic Features
● Single, simplified component
● Workload separation
● No SPOF
● Peer to peer
● JobTracker failover
● No additional Cassandra config
System Administrator's View
Address DC Rack Workload Status State Load Owns Token
148873535527910577765226390751398592512
101.202.204.101 Analytics rack1 Analytics(JT) Up Normal 78,96 GB 12,50% 0
101.202.204.102 Analytics rack1 Analytics(TT) Up Normal 82,65 GB 12,50% 21267647932558653966460912964485513216
101.202.204.103 Analytics rack1 Analytics(TT) Up Normal 74,96 GB 12,50% 42535295865117307932921825928971026432
101.202.204.104 Analytics rack1 Analytics(TT) Up Normal 78,79 GB 12,50% 63802943797675961899382738893456539648
101.202.204.105 Cassandra rack1 Cassandra Up Normal 67,42 GB 12,50% 85070591730234615865843651857942052864
101.202.204.106 Cassandra rack1 Cassandra Up Normal 60,86 GB 12,50% 106338239662793269832304564822427566080
101.202.204.107 Cassandra rack1 Cassandra Up Normal 81,27 GB 12,50% 127605887595351923798765477786913079296
101.202.204.108 Cassandra rack1 Cassandra Up Normal 77,17 GB 12,50% 148873535527910577765226390751398592512
Easy monitoring of
your nodes,
regardless of their
workload type
Wait, but where are my files?
Hadoop M/R
HDFS
Hadoop M/R
CFS
Cassandra Server
Cassandra File System Properties
● Decentralized
● Replicated
● HDFS compatible
– compatible with Hadoop filesystem utilities
– allows for running M/R programs on DSE without
any change
● Compressed
CFS Architecture
CFS Compaction
● Keeps track of deleted rows (blocks)
● When all blocks in SSTable removed,
deletes the whole SSTable
Cassandra Storage
block 1
block 2
block 3
block 4
block 5
block 6
ts 1
ts 2
block 6 block 6block 7
block 8
ts 3
ts 4
block 6block 9
block 10
X
Hive Integration
● CassandraHiveMetaStore
– stores Hive database metadata in Cassandra
– no need to run a separate RDBMS
● CassandraStorageHandler
– allows for direct access to C* tables with CFIF and
CFOF
Hive Integration – Example
CREATE EXTERNAL TABLE MyHiveTable(row_key string, col1 string, col2 string)
STORED BY 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler'
TBLPROPERTIES ("cassandra.ks.name" = "MyCassandraKS");
SELECT count(*) FROM MyHiveTable;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapred.reduce.tasks=<number>
Starting Job = job_201306041030_0001, Tracking URL = http://192.168.123.10:50030/jobdetails.jsp?jobid=job_201306041030_0001
Kill Command = /usr/bin/dse hadoop job -Dmapred.job.tracker=192.168.123.10:8012 -kill job_201306041030_0001
Hadoop job information for Stage-1: number of mappers: 9; number of reducers: 1
2013-06-04 15:11:54,573 Stage-1 map = 0%, reduce = 0%
2013-06-04 15:11:58,622 Stage-1 map = 11%, reduce = 0%, Cumulative CPU 1.04 sec
2013-06-04 15:11:59,691 Stage-1 map = 11%, reduce = 0%, Cumulative CPU 1.04 sec
...
2013-06-04 15:12:28,288 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 31.91 sec
2013-06-04 15:12:29,304 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 31.91 sec
2013-06-04 15:12:30,330 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 31.91 sec
2013-06-04 15:12:31,339 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 31.91 sec
MapReduce Total cumulative CPU time: 31 seconds 910 msec
Ended Job = job_201306041030_0001
MapReduce Jobs Launched:
Job 0: Map: 9 Reduce: 1 Cumulative CPU: 31.91 sec HDFS Read: 0 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 31 seconds 910 msec
OK
1000000
Time taken: 46.246 seconds
Custom Column Mapping
CREATE EXTERNAL TABLE Users(
userid string, name string, email string, phone string)
STORED BY 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler'
WITH
SERDEPROPERTIES (
"cassandra.columns.mapping" = ":key,user_name,primary_email,home_phone");
Cassandra: row key user_name primary_email home_phone
Hive: userid name email phone
Ad

More Related Content

What's hot (6)

SevillaR meetup: dplyr and magrittr
SevillaR meetup: dplyr and magrittrSevillaR meetup: dplyr and magrittr
SevillaR meetup: dplyr and magrittr
Romain Francois
 
Postgres performance for humans
Postgres performance for humansPostgres performance for humans
Postgres performance for humans
Craig Kerstiens
 
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEOClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO
Altinity Ltd
 
Embedded R Execution using SQL
Embedded R Execution using SQLEmbedded R Execution using SQL
Embedded R Execution using SQL
Brendan Tierney
 
Scaling the #2ndhalf
Scaling the #2ndhalfScaling the #2ndhalf
Scaling the #2ndhalf
Salo Shp
 
R + 15 minutes = Hadoop cluster
R + 15 minutes = Hadoop clusterR + 15 minutes = Hadoop cluster
R + 15 minutes = Hadoop cluster
Jeffrey Breen
 
SevillaR meetup: dplyr and magrittr
SevillaR meetup: dplyr and magrittrSevillaR meetup: dplyr and magrittr
SevillaR meetup: dplyr and magrittr
Romain Francois
 
Postgres performance for humans
Postgres performance for humansPostgres performance for humans
Postgres performance for humans
Craig Kerstiens
 
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEOClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO
Altinity Ltd
 
Embedded R Execution using SQL
Embedded R Execution using SQLEmbedded R Execution using SQL
Embedded R Execution using SQL
Brendan Tierney
 
Scaling the #2ndhalf
Scaling the #2ndhalfScaling the #2ndhalf
Scaling the #2ndhalf
Salo Shp
 
R + 15 minutes = Hadoop cluster
R + 15 minutes = Hadoop clusterR + 15 minutes = Hadoop cluster
R + 15 minutes = Hadoop cluster
Jeffrey Breen
 

Viewers also liked (20)

Hadoop+Cassandra_Integration
Hadoop+Cassandra_IntegrationHadoop+Cassandra_Integration
Hadoop+Cassandra_Integration
Joyabrata Das
 
Hadoop Integration in Cassandra
Hadoop Integration in CassandraHadoop Integration in Cassandra
Hadoop Integration in Cassandra
Jairam Chandar
 
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Modern Data Stack France
 
Hug france-2012-12-04
Hug france-2012-12-04Hug france-2012-12-04
Hug france-2012-12-04
Ted Dunning
 
M7 and Apache Drill, Micheal Hausenblas
M7 and Apache Drill, Micheal HausenblasM7 and Apache Drill, Micheal Hausenblas
M7 and Apache Drill, Micheal Hausenblas
Modern Data Stack France
 
IBM Stream au Hadoop User Group
IBM Stream au Hadoop User GroupIBM Stream au Hadoop User Group
IBM Stream au Hadoop User Group
Modern Data Stack France
 
Analyse prédictive en assurance santé par Julien Cabot
Analyse prédictive en assurance santé par Julien CabotAnalyse prédictive en assurance santé par Julien Cabot
Analyse prédictive en assurance santé par Julien Cabot
Modern Data Stack France
 
Syncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScoreSyncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScore
Modern Data Stack France
 
Talend Open Studio for Big Data (powered by Apache Hadoop)
Talend Open Studio for Big Data (powered by Apache Hadoop)Talend Open Studio for Big Data (powered by Apache Hadoop)
Talend Open Studio for Big Data (powered by Apache Hadoop)
Modern Data Stack France
 
Cassandra Hadoop Best Practices by Jeremy Hanna
Cassandra Hadoop Best Practices by Jeremy HannaCassandra Hadoop Best Practices by Jeremy Hanna
Cassandra Hadoop Best Practices by Jeremy Hanna
Modern Data Stack France
 
Gis capabilities on Big Data Systems
Gis capabilities on Big Data SystemsGis capabilities on Big Data Systems
Gis capabilities on Big Data Systems
Ahmad Jawwad
 
What is Distributed Computing, Why we use Apache Spark
What is Distributed Computing, Why we use Apache SparkWhat is Distributed Computing, Why we use Apache Spark
What is Distributed Computing, Why we use Apache Spark
Andy Petrella
 
Paris HUG - Agile Analytics Applications on Hadoop
Paris HUG - Agile Analytics Applications on HadoopParis HUG - Agile Analytics Applications on Hadoop
Paris HUG - Agile Analytics Applications on Hadoop
Hortonworks
 
Hadoop + Cassandra: Fast queries on data lakes, and wikipedia search tutorial.
Hadoop + Cassandra: Fast queries on data lakes, and  wikipedia search tutorial.Hadoop + Cassandra: Fast queries on data lakes, and  wikipedia search tutorial.
Hadoop + Cassandra: Fast queries on data lakes, and wikipedia search tutorial.
Natalino Busa
 
Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...
Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...
Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...
Cedric CARBONE
 
Cassandra spark connector
Cassandra spark connectorCassandra spark connector
Cassandra spark connector
Duyhai Doan
 
Stratio's Cassandra Lucene index: Geospatial use cases by Andrés Peña
Stratio's Cassandra Lucene index: Geospatial use cases by Andrés PeñaStratio's Cassandra Lucene index: Geospatial use cases by Andrés Peña
Stratio's Cassandra Lucene index: Geospatial use cases by Andrés Peña
Big Data Spain
 
Cassandra at eBay - Cassandra Summit 2012
Cassandra at eBay - Cassandra Summit 2012Cassandra at eBay - Cassandra Summit 2012
Cassandra at eBay - Cassandra Summit 2012
Jay Patel
 
Hadoop+Cassandra_Integration
Hadoop+Cassandra_IntegrationHadoop+Cassandra_Integration
Hadoop+Cassandra_Integration
Joyabrata Das
 
Hadoop Integration in Cassandra
Hadoop Integration in CassandraHadoop Integration in Cassandra
Hadoop Integration in Cassandra
Jairam Chandar
 
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Modern Data Stack France
 
Hug france-2012-12-04
Hug france-2012-12-04Hug france-2012-12-04
Hug france-2012-12-04
Ted Dunning
 
Analyse prédictive en assurance santé par Julien Cabot
Analyse prédictive en assurance santé par Julien CabotAnalyse prédictive en assurance santé par Julien Cabot
Analyse prédictive en assurance santé par Julien Cabot
Modern Data Stack France
 
Syncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScoreSyncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScore
Modern Data Stack France
 
Talend Open Studio for Big Data (powered by Apache Hadoop)
Talend Open Studio for Big Data (powered by Apache Hadoop)Talend Open Studio for Big Data (powered by Apache Hadoop)
Talend Open Studio for Big Data (powered by Apache Hadoop)
Modern Data Stack France
 
Cassandra Hadoop Best Practices by Jeremy Hanna
Cassandra Hadoop Best Practices by Jeremy HannaCassandra Hadoop Best Practices by Jeremy Hanna
Cassandra Hadoop Best Practices by Jeremy Hanna
Modern Data Stack France
 
Gis capabilities on Big Data Systems
Gis capabilities on Big Data SystemsGis capabilities on Big Data Systems
Gis capabilities on Big Data Systems
Ahmad Jawwad
 
What is Distributed Computing, Why we use Apache Spark
What is Distributed Computing, Why we use Apache SparkWhat is Distributed Computing, Why we use Apache Spark
What is Distributed Computing, Why we use Apache Spark
Andy Petrella
 
Paris HUG - Agile Analytics Applications on Hadoop
Paris HUG - Agile Analytics Applications on HadoopParis HUG - Agile Analytics Applications on Hadoop
Paris HUG - Agile Analytics Applications on Hadoop
Hortonworks
 
Hadoop + Cassandra: Fast queries on data lakes, and wikipedia search tutorial.
Hadoop + Cassandra: Fast queries on data lakes, and  wikipedia search tutorial.Hadoop + Cassandra: Fast queries on data lakes, and  wikipedia search tutorial.
Hadoop + Cassandra: Fast queries on data lakes, and wikipedia search tutorial.
Natalino Busa
 
Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...
Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...
Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...
Cedric CARBONE
 
Cassandra spark connector
Cassandra spark connectorCassandra spark connector
Cassandra spark connector
Duyhai Doan
 
Stratio's Cassandra Lucene index: Geospatial use cases by Andrés Peña
Stratio's Cassandra Lucene index: Geospatial use cases by Andrés PeñaStratio's Cassandra Lucene index: Geospatial use cases by Andrés Peña
Stratio's Cassandra Lucene index: Geospatial use cases by Andrés Peña
Big Data Spain
 
Cassandra at eBay - Cassandra Summit 2012
Cassandra at eBay - Cassandra Summit 2012Cassandra at eBay - Cassandra Summit 2012
Cassandra at eBay - Cassandra Summit 2012
Jay Patel
 
Ad

Similar to Cassandra Hadoop Integration at HUG France by Piotr Kołaczkowski (20)

Cassandra, web scale no sql data platform
Cassandra, web scale no sql data platformCassandra, web scale no sql data platform
Cassandra, web scale no sql data platform
Marko Švaljek
 
Spark with Cassandra by Christopher Batey
Spark with Cassandra by Christopher BateySpark with Cassandra by Christopher Batey
Spark with Cassandra by Christopher Batey
Spark Summit
 
Cram
CramCram
Cram
JamesBonfield
 
Up and running with python
Up and running with pythonUp and running with python
Up and running with python
Barry DeCicco
 
クラウドDWHとしても進化を続けるPivotal Greenplumご紹介
クラウドDWHとしても進化を続けるPivotal Greenplumご紹介クラウドDWHとしても進化を続けるPivotal Greenplumご紹介
クラウドDWHとしても進化を続けるPivotal Greenplumご紹介
Masayuki Matsushita
 
Introduction to cassandra 2014
Introduction to cassandra 2014Introduction to cassandra 2014
Introduction to cassandra 2014
Patrick McFadin
 
Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf ...
Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf ...Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf ...
Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf ...
it-people
 
MongoDB and DigitalOcean Automation with Cloud Manager
MongoDB and DigitalOcean Automation with Cloud ManagerMongoDB and DigitalOcean Automation with Cloud Manager
MongoDB and DigitalOcean Automation with Cloud Manager
Jay Gordon
 
Lazy Join Optimizations Without Upfront Statistics with Matteo Interlandi
Lazy Join Optimizations Without Upfront Statistics with Matteo InterlandiLazy Join Optimizations Without Upfront Statistics with Matteo Interlandi
Lazy Join Optimizations Without Upfront Statistics with Matteo Interlandi
Databricks
 
Quick trip around the Cosmos - Things every astronaut supposed to know
Quick trip around the Cosmos - Things every astronaut supposed to knowQuick trip around the Cosmos - Things every astronaut supposed to know
Quick trip around the Cosmos - Things every astronaut supposed to know
Rafał Hryniewski
 
Loan-defaulters-predictions(Python codes)
Loan-defaulters-predictions(Python codes)Loan-defaulters-predictions(Python codes)
Loan-defaulters-predictions(Python codes)
GraceFalabi
 
Streaming Data from Scylla to Kafka
Streaming Data from Scylla to KafkaStreaming Data from Scylla to Kafka
Streaming Data from Scylla to Kafka
ScyllaDB
 
Postgres Conference (PgCon) New York 2019
Postgres Conference (PgCon) New York 2019Postgres Conference (PgCon) New York 2019
Postgres Conference (PgCon) New York 2019
Ibrar Ahmed
 
Murtaugh 2022 Appl Comp Genomics Tidyverse lecture.pptx-1.pptx
Murtaugh 2022 Appl Comp Genomics Tidyverse lecture.pptx-1.pptxMurtaugh 2022 Appl Comp Genomics Tidyverse lecture.pptx-1.pptx
Murtaugh 2022 Appl Comp Genomics Tidyverse lecture.pptx-1.pptx
oliversen
 
Redo logfile addition in oracle rac 12c
Redo logfile addition in oracle rac 12cRedo logfile addition in oracle rac 12c
Redo logfile addition in oracle rac 12c
Debasish Nayak
 
Pandas in Python for Data Exploration .pdf
Pandas in Python for Data Exploration .pdfPandas in Python for Data Exploration .pdf
Pandas in Python for Data Exploration .pdf
sejalkadam21
 
Distributed Computing for Everyone
Distributed Computing for EveryoneDistributed Computing for Everyone
Distributed Computing for Everyone
Giovanna Roda
 
Big Data Analytics Lab File
Big Data Analytics Lab FileBig Data Analytics Lab File
Big Data Analytics Lab File
Uttam Singh Chaudhary
 
Cassandra Community Webinar | In Case of Emergency Break Glass
Cassandra Community Webinar | In Case of Emergency Break GlassCassandra Community Webinar | In Case of Emergency Break Glass
Cassandra Community Webinar | In Case of Emergency Break Glass
DataStax
 
Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass
Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break GlassCassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass
Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass
aaronmorton
 
Cassandra, web scale no sql data platform
Cassandra, web scale no sql data platformCassandra, web scale no sql data platform
Cassandra, web scale no sql data platform
Marko Švaljek
 
Spark with Cassandra by Christopher Batey
Spark with Cassandra by Christopher BateySpark with Cassandra by Christopher Batey
Spark with Cassandra by Christopher Batey
Spark Summit
 
Up and running with python
Up and running with pythonUp and running with python
Up and running with python
Barry DeCicco
 
クラウドDWHとしても進化を続けるPivotal Greenplumご紹介
クラウドDWHとしても進化を続けるPivotal Greenplumご紹介クラウドDWHとしても進化を続けるPivotal Greenplumご紹介
クラウドDWHとしても進化を続けるPivotal Greenplumご紹介
Masayuki Matsushita
 
Introduction to cassandra 2014
Introduction to cassandra 2014Introduction to cassandra 2014
Introduction to cassandra 2014
Patrick McFadin
 
Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf ...
Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf ...Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf ...
Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf ...
it-people
 
MongoDB and DigitalOcean Automation with Cloud Manager
MongoDB and DigitalOcean Automation with Cloud ManagerMongoDB and DigitalOcean Automation with Cloud Manager
MongoDB and DigitalOcean Automation with Cloud Manager
Jay Gordon
 
Lazy Join Optimizations Without Upfront Statistics with Matteo Interlandi
Lazy Join Optimizations Without Upfront Statistics with Matteo InterlandiLazy Join Optimizations Without Upfront Statistics with Matteo Interlandi
Lazy Join Optimizations Without Upfront Statistics with Matteo Interlandi
Databricks
 
Quick trip around the Cosmos - Things every astronaut supposed to know
Quick trip around the Cosmos - Things every astronaut supposed to knowQuick trip around the Cosmos - Things every astronaut supposed to know
Quick trip around the Cosmos - Things every astronaut supposed to know
Rafał Hryniewski
 
Loan-defaulters-predictions(Python codes)
Loan-defaulters-predictions(Python codes)Loan-defaulters-predictions(Python codes)
Loan-defaulters-predictions(Python codes)
GraceFalabi
 
Streaming Data from Scylla to Kafka
Streaming Data from Scylla to KafkaStreaming Data from Scylla to Kafka
Streaming Data from Scylla to Kafka
ScyllaDB
 
Postgres Conference (PgCon) New York 2019
Postgres Conference (PgCon) New York 2019Postgres Conference (PgCon) New York 2019
Postgres Conference (PgCon) New York 2019
Ibrar Ahmed
 
Murtaugh 2022 Appl Comp Genomics Tidyverse lecture.pptx-1.pptx
Murtaugh 2022 Appl Comp Genomics Tidyverse lecture.pptx-1.pptxMurtaugh 2022 Appl Comp Genomics Tidyverse lecture.pptx-1.pptx
Murtaugh 2022 Appl Comp Genomics Tidyverse lecture.pptx-1.pptx
oliversen
 
Redo logfile addition in oracle rac 12c
Redo logfile addition in oracle rac 12cRedo logfile addition in oracle rac 12c
Redo logfile addition in oracle rac 12c
Debasish Nayak
 
Pandas in Python for Data Exploration .pdf
Pandas in Python for Data Exploration .pdfPandas in Python for Data Exploration .pdf
Pandas in Python for Data Exploration .pdf
sejalkadam21
 
Distributed Computing for Everyone
Distributed Computing for EveryoneDistributed Computing for Everyone
Distributed Computing for Everyone
Giovanna Roda
 
Cassandra Community Webinar | In Case of Emergency Break Glass
Cassandra Community Webinar | In Case of Emergency Break GlassCassandra Community Webinar | In Case of Emergency Break Glass
Cassandra Community Webinar | In Case of Emergency Break Glass
DataStax
 
Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass
Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break GlassCassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass
Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass
aaronmorton
 
Ad

More from Modern Data Stack France (20)

Stash - Data FinOPS
Stash - Data FinOPSStash - Data FinOPS
Stash - Data FinOPS
Modern Data Stack France
 
Vue d'ensemble Dremio
Vue d'ensemble DremioVue d'ensemble Dremio
Vue d'ensemble Dremio
Modern Data Stack France
 
From Data Warehouse to Lakehouse
From Data Warehouse to LakehouseFrom Data Warehouse to Lakehouse
From Data Warehouse to Lakehouse
Modern Data Stack France
 
Talend spark meetup 03042017 - Paris Spark Meetup
Talend spark meetup 03042017 - Paris Spark MeetupTalend spark meetup 03042017 - Paris Spark Meetup
Talend spark meetup 03042017 - Paris Spark Meetup
Modern Data Stack France
 
Paris Spark Meetup - Trifacta - 03_04_2017
Paris Spark Meetup - Trifacta - 03_04_2017Paris Spark Meetup - Trifacta - 03_04_2017
Paris Spark Meetup - Trifacta - 03_04_2017
Modern Data Stack France
 
Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...
Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...
Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...
Modern Data Stack France
 
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
Modern Data Stack France
 
Hadoop France meetup Feb2016 : recommendations with spark
Hadoop France meetup  Feb2016 : recommendations with sparkHadoop France meetup  Feb2016 : recommendations with spark
Hadoop France meetup Feb2016 : recommendations with spark
Modern Data Stack France
 
Hug janvier 2016 -EDF
Hug   janvier 2016 -EDFHug   janvier 2016 -EDF
Hug janvier 2016 -EDF
Modern Data Stack France
 
HUG France - 20160114 industrialisation_process_big_data CanalPlus
HUG France -  20160114 industrialisation_process_big_data CanalPlusHUG France -  20160114 industrialisation_process_big_data CanalPlus
HUG France - 20160114 industrialisation_process_big_data CanalPlus
Modern Data Stack France
 
Hugfr SPARK & RIAK -20160114_hug_france
Hugfr  SPARK & RIAK -20160114_hug_franceHugfr  SPARK & RIAK -20160114_hug_france
Hugfr SPARK & RIAK -20160114_hug_france
Modern Data Stack France
 
HUG France : HBase in Financial Industry par Pierre Bittner (Scaled Risk CTO)
HUG France : HBase in Financial Industry par Pierre Bittner (Scaled Risk CTO)HUG France : HBase in Financial Industry par Pierre Bittner (Scaled Risk CTO)
HUG France : HBase in Financial Industry par Pierre Bittner (Scaled Risk CTO)
Modern Data Stack France
 
Apache Flink par Bilal Baltagi Paris Spark Meetup Dec 2015
Apache Flink par Bilal Baltagi Paris Spark Meetup Dec 2015Apache Flink par Bilal Baltagi Paris Spark Meetup Dec 2015
Apache Flink par Bilal Baltagi Paris Spark Meetup Dec 2015
Modern Data Stack France
 
Datalab 101 (Hadoop, Spark, ElasticSearch) par Jonathan Winandy - Paris Spark...
Datalab 101 (Hadoop, Spark, ElasticSearch) par Jonathan Winandy - Paris Spark...Datalab 101 (Hadoop, Spark, ElasticSearch) par Jonathan Winandy - Paris Spark...
Datalab 101 (Hadoop, Spark, ElasticSearch) par Jonathan Winandy - Paris Spark...
Modern Data Stack France
 
Record linkage, a real use case with spark ml - Paris Spark meetup Dec 2015
Record linkage, a real use case with spark ml  - Paris Spark meetup Dec 2015Record linkage, a real use case with spark ml  - Paris Spark meetup Dec 2015
Record linkage, a real use case with spark ml - Paris Spark meetup Dec 2015
Modern Data Stack France
 
Spark dataframe
Spark dataframeSpark dataframe
Spark dataframe
Modern Data Stack France
 
June Spark meetup : search as recommandation
June Spark meetup : search as recommandationJune Spark meetup : search as recommandation
June Spark meetup : search as recommandation
Modern Data Stack France
 
Spark ML par Xebia (Spark Meetup du 11/06/2015)
Spark ML par Xebia (Spark Meetup du 11/06/2015)Spark ML par Xebia (Spark Meetup du 11/06/2015)
Spark ML par Xebia (Spark Meetup du 11/06/2015)
Modern Data Stack France
 
Spark meetup at viadeo
Spark meetup at viadeoSpark meetup at viadeo
Spark meetup at viadeo
Modern Data Stack France
 
Paris Spark meetup : Extension de Spark (Tachyon / Spark JobServer) par jlamiel
Paris Spark meetup : Extension de Spark (Tachyon / Spark JobServer) par jlamielParis Spark meetup : Extension de Spark (Tachyon / Spark JobServer) par jlamiel
Paris Spark meetup : Extension de Spark (Tachyon / Spark JobServer) par jlamiel
Modern Data Stack France
 
Talend spark meetup 03042017 - Paris Spark Meetup
Talend spark meetup 03042017 - Paris Spark MeetupTalend spark meetup 03042017 - Paris Spark Meetup
Talend spark meetup 03042017 - Paris Spark Meetup
Modern Data Stack France
 
Paris Spark Meetup - Trifacta - 03_04_2017
Paris Spark Meetup - Trifacta - 03_04_2017Paris Spark Meetup - Trifacta - 03_04_2017
Paris Spark Meetup - Trifacta - 03_04_2017
Modern Data Stack France
 
Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...
Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...
Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...
Modern Data Stack France
 
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
Modern Data Stack France
 
Hadoop France meetup Feb2016 : recommendations with spark
Hadoop France meetup  Feb2016 : recommendations with sparkHadoop France meetup  Feb2016 : recommendations with spark
Hadoop France meetup Feb2016 : recommendations with spark
Modern Data Stack France
 
HUG France - 20160114 industrialisation_process_big_data CanalPlus
HUG France -  20160114 industrialisation_process_big_data CanalPlusHUG France -  20160114 industrialisation_process_big_data CanalPlus
HUG France - 20160114 industrialisation_process_big_data CanalPlus
Modern Data Stack France
 
HUG France : HBase in Financial Industry par Pierre Bittner (Scaled Risk CTO)
HUG France : HBase in Financial Industry par Pierre Bittner (Scaled Risk CTO)HUG France : HBase in Financial Industry par Pierre Bittner (Scaled Risk CTO)
HUG France : HBase in Financial Industry par Pierre Bittner (Scaled Risk CTO)
Modern Data Stack France
 
Apache Flink par Bilal Baltagi Paris Spark Meetup Dec 2015
Apache Flink par Bilal Baltagi Paris Spark Meetup Dec 2015Apache Flink par Bilal Baltagi Paris Spark Meetup Dec 2015
Apache Flink par Bilal Baltagi Paris Spark Meetup Dec 2015
Modern Data Stack France
 
Datalab 101 (Hadoop, Spark, ElasticSearch) par Jonathan Winandy - Paris Spark...
Datalab 101 (Hadoop, Spark, ElasticSearch) par Jonathan Winandy - Paris Spark...Datalab 101 (Hadoop, Spark, ElasticSearch) par Jonathan Winandy - Paris Spark...
Datalab 101 (Hadoop, Spark, ElasticSearch) par Jonathan Winandy - Paris Spark...
Modern Data Stack France
 
Record linkage, a real use case with spark ml - Paris Spark meetup Dec 2015
Record linkage, a real use case with spark ml  - Paris Spark meetup Dec 2015Record linkage, a real use case with spark ml  - Paris Spark meetup Dec 2015
Record linkage, a real use case with spark ml - Paris Spark meetup Dec 2015
Modern Data Stack France
 
June Spark meetup : search as recommandation
June Spark meetup : search as recommandationJune Spark meetup : search as recommandation
June Spark meetup : search as recommandation
Modern Data Stack France
 
Spark ML par Xebia (Spark Meetup du 11/06/2015)
Spark ML par Xebia (Spark Meetup du 11/06/2015)Spark ML par Xebia (Spark Meetup du 11/06/2015)
Spark ML par Xebia (Spark Meetup du 11/06/2015)
Modern Data Stack France
 
Paris Spark meetup : Extension de Spark (Tachyon / Spark JobServer) par jlamiel
Paris Spark meetup : Extension de Spark (Tachyon / Spark JobServer) par jlamielParis Spark meetup : Extension de Spark (Tachyon / Spark JobServer) par jlamiel
Paris Spark meetup : Extension de Spark (Tachyon / Spark JobServer) par jlamiel
Modern Data Stack France
 

Recently uploaded (9)

Pre-Season Stretching Guide for Club Cricketers
Pre-Season Stretching Guide for Club CricketersPre-Season Stretching Guide for Club Cricketers
Pre-Season Stretching Guide for Club Cricketers
Kusal Goonewardena
 
Microsoft Adopts Google’s A2A Protocol A New Era for AI Agent Interoperabilit...
Microsoft Adopts Google’s A2A Protocol A New Era for AI Agent Interoperabilit...Microsoft Adopts Google’s A2A Protocol A New Era for AI Agent Interoperabilit...
Microsoft Adopts Google’s A2A Protocol A New Era for AI Agent Interoperabilit...
davidandersonofficia
 
Instruction Manual | Labradar LX Chronograph | Optics Trade
Instruction Manual | Labradar LX Chronograph | Optics TradeInstruction Manual | Labradar LX Chronograph | Optics Trade
Instruction Manual | Labradar LX Chronograph | Optics Trade
Optics-Trade
 
PSG and Inter Milan Change the Champions League Final Story.docx
PSG and Inter Milan Change the Champions League Final Story.docxPSG and Inter Milan Change the Champions League Final Story.docx
PSG and Inter Milan Change the Champions League Final Story.docx
Xchange Tickets
 
Heelys FW2008_Catalog - каталог хіліс 2008 року
Heelys FW2008_Catalog - каталог хіліс 2008 рокуHeelys FW2008_Catalog - каталог хіліс 2008 року
Heelys FW2008_Catalog - каталог хіліс 2008 року
Alex
 
Gabriel Kalembo Explains 5 Soccer Skills for Youth Player Growth
Gabriel Kalembo Explains 5 Soccer Skills for Youth Player GrowthGabriel Kalembo Explains 5 Soccer Skills for Youth Player Growth
Gabriel Kalembo Explains 5 Soccer Skills for Youth Player Growth
Gabriel Kalembo
 
Introduction Fitness and Performance in the Digital Era.pdf
Introduction Fitness and Performance in the Digital Era.pdfIntroduction Fitness and Performance in the Digital Era.pdf
Introduction Fitness and Performance in the Digital Era.pdf
mary rojas
 
IBM-Replaces-200-HR-Professionals-with-AI-Agents-The-Future-of-HR-Automation.pdf
IBM-Replaces-200-HR-Professionals-with-AI-Agents-The-Future-of-HR-Automation.pdfIBM-Replaces-200-HR-Professionals-with-AI-Agents-The-Future-of-HR-Automation.pdf
IBM-Replaces-200-HR-Professionals-with-AI-Agents-The-Future-of-HR-Automation.pdf
davidandersonofficia
 
Catalog 2025 | Rusan Products | Optics Trade
Catalog 2025 | Rusan Products | Optics TradeCatalog 2025 | Rusan Products | Optics Trade
Catalog 2025 | Rusan Products | Optics Trade
Optics-Trade
 
Pre-Season Stretching Guide for Club Cricketers
Pre-Season Stretching Guide for Club CricketersPre-Season Stretching Guide for Club Cricketers
Pre-Season Stretching Guide for Club Cricketers
Kusal Goonewardena
 
Microsoft Adopts Google’s A2A Protocol A New Era for AI Agent Interoperabilit...
Microsoft Adopts Google’s A2A Protocol A New Era for AI Agent Interoperabilit...Microsoft Adopts Google’s A2A Protocol A New Era for AI Agent Interoperabilit...
Microsoft Adopts Google’s A2A Protocol A New Era for AI Agent Interoperabilit...
davidandersonofficia
 
Instruction Manual | Labradar LX Chronograph | Optics Trade
Instruction Manual | Labradar LX Chronograph | Optics TradeInstruction Manual | Labradar LX Chronograph | Optics Trade
Instruction Manual | Labradar LX Chronograph | Optics Trade
Optics-Trade
 
PSG and Inter Milan Change the Champions League Final Story.docx
PSG and Inter Milan Change the Champions League Final Story.docxPSG and Inter Milan Change the Champions League Final Story.docx
PSG and Inter Milan Change the Champions League Final Story.docx
Xchange Tickets
 
Heelys FW2008_Catalog - каталог хіліс 2008 року
Heelys FW2008_Catalog - каталог хіліс 2008 рокуHeelys FW2008_Catalog - каталог хіліс 2008 року
Heelys FW2008_Catalog - каталог хіліс 2008 року
Alex
 
Gabriel Kalembo Explains 5 Soccer Skills for Youth Player Growth
Gabriel Kalembo Explains 5 Soccer Skills for Youth Player GrowthGabriel Kalembo Explains 5 Soccer Skills for Youth Player Growth
Gabriel Kalembo Explains 5 Soccer Skills for Youth Player Growth
Gabriel Kalembo
 
Introduction Fitness and Performance in the Digital Era.pdf
Introduction Fitness and Performance in the Digital Era.pdfIntroduction Fitness and Performance in the Digital Era.pdf
Introduction Fitness and Performance in the Digital Era.pdf
mary rojas
 
IBM-Replaces-200-HR-Professionals-with-AI-Agents-The-Future-of-HR-Automation.pdf
IBM-Replaces-200-HR-Professionals-with-AI-Agents-The-Future-of-HR-Automation.pdfIBM-Replaces-200-HR-Professionals-with-AI-Agents-The-Future-of-HR-Automation.pdf
IBM-Replaces-200-HR-Professionals-with-AI-Agents-The-Future-of-HR-Automation.pdf
davidandersonofficia
 
Catalog 2025 | Rusan Products | Optics Trade
Catalog 2025 | Rusan Products | Optics TradeCatalog 2025 | Rusan Products | Optics Trade
Catalog 2025 | Rusan Products | Optics Trade
Optics-Trade
 

Cassandra Hadoop Integration at HUG France by Piotr Kołaczkowski

  • 1. Handling realtime and analytic workloads in a single cluster with Hadoop and Cassandra Handling realtime and analytic workloads in a single cluster with Hadoop and Cassandra Piotr Kołaczkowski pkolaczk@datastax.com @pkolaczk Piotr Kołaczkowski pkolaczk@datastax.com @pkolaczk
  • 2. Basic Cassandra + Hadoop Integration C* C* C* C* C* C* C* C* Cassandra Cluster Hadoop Cluster NameNode & JobTracker DataNode DataNode DataNode DataNode DataNode DataNode CFIF CFOF
  • 3. ColumnFamilyInputFormat jim age: 36 car: camaro gender: M carol age: 37 car: subaru johnny age: 12 gender: M suzy age: 10 gender: F Key: ByteBuffer Value: SortedMap<ByteBuffer, IColumn> (column name, value, timestamp) row key column name
  • 4. ColumnFamilyInputFormat jim age: 36 car: camaro gender: M carol age: 37 car: subaru johnny age: 12 gender: M suzy age: 10 gender: F Input Key: jim age: 36 car: camaro gender: M Input Value:
  • 5. ColumnFamilyInputFormat jim age: 36 car: camaro gender: M carol age: 37 car: subaru johnny age: 12 gender: M suzy age: 10 gender: F Input Key: carol age: 37 car: subaru Input Value:
  • 6. ColumnFamilyInputFormat jim age: 36 car: camaro gender: M carol age: 37 car: subaru johnny age: 12 gender: M suzy age: 10 gender: F Input Key: johnny age: 12 gender: M Input Value:
  • 7. ColumnFamilyInputFormat jim age: 36 car: camaro gender: M carol age: 37 car: subaru johnny age: 12 gender: M suzy age: 10 gender: F Input Key: suzy age: 10 gender: F Input Value:
  • 8. CFIF – Wide Row Support Input Key: jim age: 36 Input Value: jim age: 36 car: camaro gender: M carol age: 37 car: subaru johnny age: 12 gender: M suzy age: 10 gender: F
  • 9. CFIF – Wide Row Support Input Key: jim car: camaro Input Value: jim age: 36 car: camaro gender: M carol age: 37 car: subaru johnny age: 12 gender: M suzy age: 10 gender: F
  • 10. CFIF – Wide Row Support Input Key: jim gender: M Input Value: jim age: 36 car: camaro gender: M carol age: 37 car: subaru johnny age: 12 gender: M suzy age: 10 gender: F
  • 11. CFIF – Wide Row Support Input Key: carol age: 37 Input Value: jim age: 36 car: camaro gender: M carol age: 37 car: subaru johnny age: 12 gender: M suzy age: 10 gender: F
  • 12. CFIF – Wide Row Support Input Key: carol car: subaru Input Value: jim age: 36 car: camaro gender: M carol age: 37 car: subaru johnny age: 12 gender: M suzy age: 10 gender: F
  • 13. CFIF – Cassandra Secondary Index Support IndexExpression expr = new IndexExpression( ByteBufferUtil.bytes("car"), IndexOperator.EQ, ByteBufferUitl.bytes("subaru") ); ConfigHelper.setInputRange( job.getConfiguration(), Arrays.asList(expr) ); jim age: 36 car: camaro gender: M carol age: 37 car: subaru johnny age: 12 gender: M suzy age: 10 gender: F
  • 14. ColumnFamilyOutputFormat ● Key: ByteBuffer (row key) ● Value: List<Mutation> – Mutation: insert or delete a column C* C* C* C* C* C* C* C* Cassandra Cluster ColumnFamilyRecordWriter write queue client thrift
  • 15. CFOF – Creating Mutations ByteBuffer rowkey = ByteBufferUtil.bytes(“carol”); Column column = new Column(); column.name = ByteBufferUtil.bytes(“age”); column.value = ByteBufferUtil.bytes(37); List<Mutation> mutations; Mutation mutation = new Mutation(); mutation.column_or_supercolumn = new ColumnOrSuperColumn(); mutation.column_or_supercolumn.column = column; mutations.add(mutation); context.write(rowkey, mutationList);
  • 16. BulkOutputFormat Hadoop Temporary Dir SSTable 1 SSTable 2 SSTable N... flush write BulkRecordWriter Memory Buffer
  • 17. DataStax Enterprise: Cassandra and Hadoop in a Single Cluster
  • 18. Basic Features ● Single, simplified component ● Workload separation ● No SPOF ● Peer to peer ● JobTracker failover ● No additional Cassandra config
  • 19. System Administrator's View Address DC Rack Workload Status State Load Owns Token 148873535527910577765226390751398592512 101.202.204.101 Analytics rack1 Analytics(JT) Up Normal 78,96 GB 12,50% 0 101.202.204.102 Analytics rack1 Analytics(TT) Up Normal 82,65 GB 12,50% 21267647932558653966460912964485513216 101.202.204.103 Analytics rack1 Analytics(TT) Up Normal 74,96 GB 12,50% 42535295865117307932921825928971026432 101.202.204.104 Analytics rack1 Analytics(TT) Up Normal 78,79 GB 12,50% 63802943797675961899382738893456539648 101.202.204.105 Cassandra rack1 Cassandra Up Normal 67,42 GB 12,50% 85070591730234615865843651857942052864 101.202.204.106 Cassandra rack1 Cassandra Up Normal 60,86 GB 12,50% 106338239662793269832304564822427566080 101.202.204.107 Cassandra rack1 Cassandra Up Normal 81,27 GB 12,50% 127605887595351923798765477786913079296 101.202.204.108 Cassandra rack1 Cassandra Up Normal 77,17 GB 12,50% 148873535527910577765226390751398592512 Easy monitoring of your nodes, regardless of their workload type
  • 20. Wait, but where are my files? Hadoop M/R HDFS Hadoop M/R CFS Cassandra Server
  • 21. Cassandra File System Properties ● Decentralized ● Replicated ● HDFS compatible – compatible with Hadoop filesystem utilities – allows for running M/R programs on DSE without any change ● Compressed
  • 23. CFS Compaction ● Keeps track of deleted rows (blocks) ● When all blocks in SSTable removed, deletes the whole SSTable Cassandra Storage block 1 block 2 block 3 block 4 block 5 block 6 ts 1 ts 2 block 6 block 6block 7 block 8 ts 3 ts 4 block 6block 9 block 10 X
  • 24. Hive Integration ● CassandraHiveMetaStore – stores Hive database metadata in Cassandra – no need to run a separate RDBMS ● CassandraStorageHandler – allows for direct access to C* tables with CFIF and CFOF
  • 25. Hive Integration – Example CREATE EXTERNAL TABLE MyHiveTable(row_key string, col1 string, col2 string) STORED BY 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler' TBLPROPERTIES ("cassandra.ks.name" = "MyCassandraKS"); SELECT count(*) FROM MyHiveTable; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapred.reduce.tasks=<number> Starting Job = job_201306041030_0001, Tracking URL = http://192.168.123.10:50030/jobdetails.jsp?jobid=job_201306041030_0001 Kill Command = /usr/bin/dse hadoop job -Dmapred.job.tracker=192.168.123.10:8012 -kill job_201306041030_0001 Hadoop job information for Stage-1: number of mappers: 9; number of reducers: 1 2013-06-04 15:11:54,573 Stage-1 map = 0%, reduce = 0% 2013-06-04 15:11:58,622 Stage-1 map = 11%, reduce = 0%, Cumulative CPU 1.04 sec 2013-06-04 15:11:59,691 Stage-1 map = 11%, reduce = 0%, Cumulative CPU 1.04 sec ... 2013-06-04 15:12:28,288 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 31.91 sec 2013-06-04 15:12:29,304 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 31.91 sec 2013-06-04 15:12:30,330 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 31.91 sec 2013-06-04 15:12:31,339 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 31.91 sec MapReduce Total cumulative CPU time: 31 seconds 910 msec Ended Job = job_201306041030_0001 MapReduce Jobs Launched: Job 0: Map: 9 Reduce: 1 Cumulative CPU: 31.91 sec HDFS Read: 0 HDFS Write: 0 SUCCESS Total MapReduce CPU Time Spent: 31 seconds 910 msec OK 1000000 Time taken: 46.246 seconds
  • 26. Custom Column Mapping CREATE EXTERNAL TABLE Users( userid string, name string, email string, phone string) STORED BY 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler' WITH SERDEPROPERTIES ( "cassandra.columns.mapping" = ":key,user_name,primary_email,home_phone"); Cassandra: row key user_name primary_email home_phone Hive: userid name email phone
  翻译: