SlideShare a Scribd company logo
© 2016 MapR Technologies 1© 2014 MapR Technologies
© 2016 MapR Technologies 2
Contact Information
Ted Dunning
Chief Applications Architect at MapR Technologies
Committer & PMC for Apache’s Drill, Zookeeper & others
VP of Incubator at Apache Foundation
Email tdunning@apache.org tdunning@maprtech.com
Twitter @ted_dunning
Hashtags today: #stratahadoop #ojai
© 2016 MapR Technologies 3
Streaming Architecture
by Ted Dunning and Ellen Friedman © 2016 (published by O’Reilly)
Free copies at book
signing today
3:40PM @ MapR booth
http://bit.ly/mapr-ebook-streams
© 2016 MapR Technologies 4
Goals
• Real-time or near-time
– Includes situations with deadlines
– Also includes situations where delay is simply undesirable
– Even includes situations where delay is just fine
• Micro-services
– Streaming is a convenient idiom for design
– Micro-services … you know we wanted it
– Service isolation is a key requirement
© 2016 MapR Technologies 5
Real-time or Near-time?
• The real point is flow versus state (see talk later today)
• One consequence of flow-based computing is real-time and
near-time become relatively easy
• Life may be a bitch, but it doesn’t happen in batches!
© 2016 MapR Technologies 6
Agenda
• Background / micro-services
• Global requirements
• Scale
© 2016 MapR Technologies 7
A microservice is
loosely coupled
with bounded context
© 2016 MapR Technologies 8
How to Couple Services and Break micro-ness
• Shared schemas, relational stores
• Ad hoc communication between services
• Enterprise service busses
• Brittle protocols
• Poor protocol versioning
© 2016 MapR Technologies 9
How to Decouple Services
• Use self-describing data
• Private databases
• Infrastructural communication between services
• Use modern protocols
• Adopt future-proof protocol practices
• Use shared storage where necessary due to scale
© 2016 MapR Technologies 11
What is the Right Structure for Flow Compute?
• Traditional message queues?
– Message queues are classic answer
– Key feature/bug is out-of-order acknowledgement
– Many implementations
– You pay a huge performance hit for persistence
• Kafka-esque Logs?
– Logs are like queues, but with ordering
– Out of order consumption is possible, acknowledgement not so much
– Canonical base implementation is Kafka
– Performance plus persistence
© 2016 MapR Technologies 12
Scenarios
Profile Database
© 2016 MapR Technologies 13
The task
?
POS 1
location, t, card #
yes/no?
POS 2
location, t, card #
yes/no?
© 2016 MapR Technologies 14
Traditional Solution
POS
1..n
Fraud
detector
Last card
use
© 2016 MapR Technologies 15
What Happens Next?
POS
1..n
Fraud
detector
Last card
use
POS
1..n
Fraud
detector
POS
1..n
Fraud
detector
© 2016 MapR Technologies 16
What Happens Next?
POS
1..n
Fraud
detector
Last card
use
POS
1..n
Fraud
detector
POS
1..n
Fraud
detector
© 2016 MapR Technologies 17
How to Get Service Isolation
POS
1..n
Fraud
detector
Last card
use
Updater
card activity
© 2016 MapR Technologies 18
New Uses of Data
POS
1..n
Fraud
detector
Last card
use
Updater
Card
location
history
Other
card activity
© 2016 MapR Technologies 19
Scaling Through Isolation
POS
1..n
Last card
use
Updater
POS
1..n
Last card
use
Updater
card activity
Fraud
detector
Fraud
detector
© 2016 MapR Technologies 20
Lessons
• De-coupling and isolation are key
• Private data stores/tables are important,
– but local storage of private data is a bug
• Propagate events, not table updates
© 2016 MapR Technologies 21
Scenarios
IoT Data Aggregation
© 2016 MapR Technologies 22
Basic Situation
Each location
has many
pumps
pump data
Multiple
locations
© 2016 MapR Technologies 23
What Does a Pump Look Like
inlet
out let
m ot or
Temperature
Pressure
Flow
Temperature
Pressure
Flow
Winding temperature
Voltage
Current
© 2016 MapR Technologies 24
Basic Situation
Each location
has many
pumps
pump data
Multiple
locations
© 2016 MapR Technologies 25
pump data
pump data
pump data
pump data
Basic Architecture Reflects Business Structure
© 2016 MapR Technologies 26
Lessons
• Data architecture should reflect business structure
• Even very modest designs involve multiple data centers
• Schemas cannot be frozen in the real world
• Security must follow data ownership
© 2016 MapR Technologies 27
Scenarios
Global Data Recovery
© 2016 MapR Technologies 28
Tokyo
Corporate
HQ
© 2016 MapR Technologies 29
Singapore
Tokyo
Corporate
HQ
© 2016 MapR Technologies 30
Singapore
Tokyo
Corporate
HQ
© 2016 MapR Technologies 31
Singapore
Tokyo
Corporate
HQ
© 2016 MapR Technologies 32
Lessons
• Arbitrary number of topics important for simplicity + performance
• Updates happen in many places
• Mobility implies change in replication patterns
• Multi-master updates simplify design massively
© 2016 MapR Technologies 33
Converged Requirements
© 2016 MapR Technologies 34
What Have We Learned?
• Need persistence and performance
– Possibly for years and to 100’s of millions t/s
• Must have convergence
– Need files, tables AND streams
– Need volumes, snapshots, mirrors, permissions and …
• Must have platform security
– Cannot depend on perimeter
– Must follow business structure
• Must have global scale and scope
– Millions of topics for natural designs
– Multi-master replication and update
© 2016 MapR Technologies 35
The Importance of Common API’s
• Commonality and interoperability are critical
– Compare Hadoop eco-system and the noSQL world
• Table stakes
– Persistence
– Performance
– Polymorphism
• Major trend so far is to adopt Kafka API
– 0.9 API and beyond remove major abstraction leaks
– Kafka API supported by all major Hadoop vendors
© 2016 MapR Technologies 36
What we do
© 2016 MapR Technologies 37
Evolution of Data Storage
Functionality
Compatibility
Scalability
Linux
POSIX
Over decades of progress,
Unix-based systems have set the
standard for compatibility and
functionality
© 2016 MapR Technologies 38
Functionality
Compatibility
Scalability
Linux
POSIX
Hadoop
Hadoop achieves much higher
scalability by trading away
essentially all of this compatibility
Evolution of Data Storage
© 2016 MapR Technologies 39
Evolution of Data Storage
Functionality
Compatibility
Scalability
Linux
POSIX
Hadoop
MapR enhanced Apache Hadoop by
restoring the compatibility while
increasing scalability and performance
Functionality
Compatibility
Scalability
POSIX
© 2016 MapR Technologies 40
Functionality
Compatibility
Scalability
Linux
POSIX
Hadoop
Evolution of Data Storage
Adding tables and streams enhances
the functionality of the base file
system
© 2016 MapR Technologies 41
http://bit.ly/fastest-big-data
© 2016 MapR Technologies 42
How we do this with MapR
• MapR Streams is a C++ reimplementation of Kafka API
– Advantages in predictability, performance, scale
– Common security and permissions with entire MapR converged data
platform
• Semantic extensions
– A cluster contains volumes, files, tables … and now streams
– Streams contain topics
– Can have default stream or can name stream by path name
• Core MapR capabilities preserved
– Consistent snapshots, mirrors, multi-master replication
© 2016 MapR Technologies 43
MapR original Innovations
• Volumes
– Distributed management
– Data placement
• Read/write random access file system
– Allows distributed meta-data
– Improved scaling
– Enables NFS access
• Application-level NIC bonding
• Transactionally correct snapshots and mirrors
© 2016 MapR Technologies 44
MapR's Containers
 Each container contains
 Directories & files
 Data blocks
 Replicated on servers
 No need to manage
directly
Files/directories are sharded into blocks, which
are placed into containers on disks
Containers are 16-
32 GB segments of
disk, placed on
nodes
© 2016 MapR Technologies 45
MapR's Containers
 Each container has a
replication chain
 Updates are transactional
 Failures are handled by
rearranging replication
© 2016 MapR Technologies 46
Container locations and replication
CLDB
N1, N2
N3, N2
N1, N2
N1, N3
N3, N2
N1
N2
N3Container location database
(CLDB) keeps track of nodes
hosting each container and
replication chain order
© 2016 MapR Technologies 47
MapR Scaling
Containers represent 16 - 32GB of data
 Each can hold up to 1 Billion files and directories
 100M containers = ~ 2 Exabytes (a very large cluster)
250 bytes DRAM to cache a container
 25GB to cache all containers for 2EB cluster
 But not necessary, can page to disk
 Typical large 10PB cluster needs 2GB
Container-reports are 100x - 1000x < HDFS block-reports
 Serve 100x more data-nodes
 Increase container size to 64G to serve 4EB cluster
 Map/reduce not affected
© 2016 MapR Technologies 48
But Wait, There’s More
• Directories and files are implemented in terms of B-trees
– Key is offset, value is data blob
– Internal transactional semantics guarantees safety and consistency
– Layout algorithms give very high layout linearization
• Tables are implemented in terms of B-trees
– Twisted B-tree implementation allows virtues of log-structured merge
tree without the compaction delays
– Tablet splitting without pausing, integration with file system transactions
• Common security and permissions scheme
© 2016 MapR Technologies 49
Table
Tablet Partition
Similar to LSM implementations,
tables are decomposed by key ranges
Distinct from HBase and Level DB,
MapR tables used fixed number
(greater than 1) of decompositions
Very unusually, relative to LSM and
cousins, data structures at the leaf are
mutable
© 2016 MapR Technologies 50
Re-use of Proven Technology
Partitions are
distributed just
like file chunks
Same replication and
transaction technology
© 2016 MapR Technologies 51
And More …
• Streams are implemented in terms of B-trees as well
– Topics and consumer offsets are kept in stream, not ZK
– Similar splitting technology as MapR DB tables
– Consistent permissions, security, data replication
• Standard Kafka 0.9 API
• Plans to add OJAI for high-level structuring
• Performance is very high
© 2016 MapR Technologies 52
Example
Files
Table
Streams
Directories
Cluster
Volume mount point
© 2016 MapR Technologies 53
Cluster
Volume mount point
© 2016 MapR Technologies 54
Lessons
• API’s matter more than implementations
• There is plenty of room to innovate ahead of the community
• Posix, HDFS, HBASE all define useful API’s
• Kafka 0.9+ does the same
© 2016 MapR Technologies 55
Call to action:
Support the common API’s
© 2016 MapR Technologies 56
Call to action:
Support the Kafka API’s
And come by the MapR booth
to check out MapR Streams
© 2016 MapR Technologies 57
© 2016 MapR Technologies 58
Streaming Architecture
by Ted Dunning and Ellen Friedman © 2016 (published by O’Reilly)
Free copies at book
signing today
http://bit.ly/mapr-ebook-streams
© 2016 MapR Technologies 59
Read online mapr.com/6ebooks-read
Download pdfs mapr.com/6ebooks-pdf
6 Free ebooks
Streaming
Architecture
Ted Dunning &
Ellen Friedman
and MapR Streams
Read online mapr.com/6ebooks-read
Download pdfs mapr.com/6ebooks-pdf
6 Free ebooks
Streaming
Architecture
Ted Dunning &
Ellen Friedman
and MapR Streams
Read online mapr.com/6ebooks-read
Download pdfs mapr.com/6ebooks-pdf
6 Free ebooks
Streaming
Architecture
Ted Dunning &
Ellen Friedman
and MapR Streams
Read online mapr.com/6ebooks-read
Download pdfs mapr.com/6ebooks-pdf
6 Free ebooks
Streaming
Architecture
Ted Dunning &
Ellen Friedman
and MapR Streams
Read online mapr.com/6ebooks-read
Download pdfs mapr.com/6ebooks-pdf
6 Free ebooks
Streaming
Architecture
Ted Dunning &
Ellen Friedman
and MapR Streams
Read online mapr.com/6ebooks-read
Download pdfs mapr.com/6ebooks-pdf
6 Free ebooks
Streaming
Architecture
Ted Dunning &
Ellen Friedman
and MapR Streams
Read online mapr.com/6ebooks-read
Download pdfs mapr.com/6ebooks-pdf
6 Free ebooks
Streaming
Architecture
Ted Dunning &
Ellen Friedman
and MapR Streams
Read online mapr.com/6ebooks-read
Download pdfs mapr.com/6ebooks-pdf
6 Free ebooks
Streaming
Architecture
Ted Dunning &
Ellen Friedman
and MapR Streams
© 2016 MapR Technologies 60
Thank you for coming today!
© 2016 MapR Technologies 61
…helping you put data technology to work
● Find answers
● Ask technical questions
● Join on-demand training course
discussions
● Follow release announcements
● Share and vote on product ideas
● Find Meetup and event listings
Connect with fellow Apache
Hadoop and Spark professionals
community.mapr.com
© 2016 MapR Technologies 62
Q&A
@mapr maprtech
tdunning@maprtech.com
Engage with us!
MapR
maprtech
mapr-technologies
Ad

More Related Content

What's hot (20)

The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
Databricks
 
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark Summit
 
Introduction to Spark Streaming
Introduction to Spark StreamingIntroduction to Spark Streaming
Introduction to Spark Streaming
datamantra
 
Mining Data Streams
Mining Data StreamsMining Data Streams
Mining Data Streams
SujaAldrin
 
Introduction to apache spark
Introduction to apache spark Introduction to apache spark
Introduction to apache spark
Aakashdata
 
Spark graphx
Spark graphxSpark graphx
Spark graphx
Carol McDonald
 
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangApache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Databricks
 
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Simplilearn
 
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
Simplilearn
 
Spark Summit EU talk by Mike Percy
Spark Summit EU talk by Mike PercySpark Summit EU talk by Mike Percy
Spark Summit EU talk by Mike Percy
Spark Summit
 
The columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache ArrowThe columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache Arrow
Julien Le Dem
 
Graph Analytics for big data
Graph Analytics for big dataGraph Analytics for big data
Graph Analytics for big data
Sigmoid
 
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise NecessityApache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
Wes McKinney
 
Optimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsOptimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL Joins
Databricks
 
Spark SQL
Spark SQLSpark SQL
Spark SQL
Joud Khattab
 
Getting The Best Performance With PySpark
Getting The Best Performance With PySparkGetting The Best Performance With PySpark
Getting The Best Performance With PySpark
Spark Summit
 
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
Dremio Corporation
 
Parquet - Data I/O - Philadelphia 2013
Parquet - Data I/O - Philadelphia 2013Parquet - Data I/O - Philadelphia 2013
Parquet - Data I/O - Philadelphia 2013
larsgeorge
 
The Apache Spark File Format Ecosystem
The Apache Spark File Format EcosystemThe Apache Spark File Format Ecosystem
The Apache Spark File Format Ecosystem
Databricks
 
Introduction to Pig
Introduction to PigIntroduction to Pig
Introduction to Pig
Prashanth Babu
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
Databricks
 
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark Summit
 
Introduction to Spark Streaming
Introduction to Spark StreamingIntroduction to Spark Streaming
Introduction to Spark Streaming
datamantra
 
Mining Data Streams
Mining Data StreamsMining Data Streams
Mining Data Streams
SujaAldrin
 
Introduction to apache spark
Introduction to apache spark Introduction to apache spark
Introduction to apache spark
Aakashdata
 
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangApache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Databricks
 
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Simplilearn
 
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
Simplilearn
 
Spark Summit EU talk by Mike Percy
Spark Summit EU talk by Mike PercySpark Summit EU talk by Mike Percy
Spark Summit EU talk by Mike Percy
Spark Summit
 
The columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache ArrowThe columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache Arrow
Julien Le Dem
 
Graph Analytics for big data
Graph Analytics for big dataGraph Analytics for big data
Graph Analytics for big data
Sigmoid
 
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise NecessityApache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
Wes McKinney
 
Optimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsOptimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL Joins
Databricks
 
Getting The Best Performance With PySpark
Getting The Best Performance With PySparkGetting The Best Performance With PySpark
Getting The Best Performance With PySpark
Spark Summit
 
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
Dremio Corporation
 
Parquet - Data I/O - Philadelphia 2013
Parquet - Data I/O - Philadelphia 2013Parquet - Data I/O - Philadelphia 2013
Parquet - Data I/O - Philadelphia 2013
larsgeorge
 
The Apache Spark File Format Ecosystem
The Apache Spark File Format EcosystemThe Apache Spark File Format Ecosystem
The Apache Spark File Format Ecosystem
Databricks
 

Viewers also liked (20)

Apache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, ScaleApache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, Scale
DataWorks Summit/Hadoop Summit
 
Producing Spark on YARN for ETL
Producing Spark on YARN for ETLProducing Spark on YARN for ETL
Producing Spark on YARN for ETL
DataWorks Summit/Hadoop Summit
 
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
DataWorks Summit/Hadoop Summit
 
Building a Data Pipeline from Scratch - Joe Crobak
Building a Data Pipeline from Scratch - Joe CrobakBuilding a Data Pipeline from Scratch - Joe Crobak
Building a Data Pipeline from Scratch - Joe Crobak
Hakka Labs
 
Messaging Architectures with NoSQL Databases as Message Stores
Messaging Architectures with NoSQL Databases as Message StoresMessaging Architectures with NoSQL Databases as Message Stores
Messaging Architectures with NoSQL Databases as Message Stores
Srini Penchikala
 
Hadoop Crash Course Hadoop Summit SJ
Hadoop Crash Course Hadoop Summit SJ Hadoop Crash Course Hadoop Summit SJ
Hadoop Crash Course Hadoop Summit SJ
Daniel Madrigal
 
Autoscaling Spark on AWS EC2 - 11th Spark London meetup
Autoscaling Spark on AWS EC2 - 11th Spark London meetupAutoscaling Spark on AWS EC2 - 11th Spark London meetup
Autoscaling Spark on AWS EC2 - 11th Spark London meetup
Rafal Kwasny
 
From Batch to Realtime with Hadoop - Berlin Buzzwords - June 2012
From Batch to Realtime with Hadoop - Berlin Buzzwords - June 2012From Batch to Realtime with Hadoop - Berlin Buzzwords - June 2012
From Batch to Realtime with Hadoop - Berlin Buzzwords - June 2012
larsgeorge
 
Scheduling Policies in YARN
Scheduling Policies in YARNScheduling Policies in YARN
Scheduling Policies in YARN
DataWorks Summit/Hadoop Summit
 
Apache HBase: State of the Union
Apache HBase: State of the UnionApache HBase: State of the Union
Apache HBase: State of the Union
DataWorks Summit/Hadoop Summit
 
Quark Virtualization Engine for Analytics
Quark Virtualization Engine for Analytics Quark Virtualization Engine for Analytics
Quark Virtualization Engine for Analytics
DataWorks Summit/Hadoop Summit
 
Streaming in the Wild with Apache Flink
Streaming in the Wild with Apache FlinkStreaming in the Wild with Apache Flink
Streaming in the Wild with Apache Flink
DataWorks Summit/Hadoop Summit
 
What's new in SQL on Hadoop and Beyond
What's new in SQL on Hadoop and BeyondWhat's new in SQL on Hadoop and Beyond
What's new in SQL on Hadoop and Beyond
DataWorks Summit/Hadoop Summit
 
Operating and Supporting Apache HBase Best Practices and Improvements
Operating and Supporting Apache HBase Best Practices and ImprovementsOperating and Supporting Apache HBase Best Practices and Improvements
Operating and Supporting Apache HBase Best Practices and Improvements
DataWorks Summit/Hadoop Summit
 
Machine Learning for Any Size of Data, Any Type of Data
Machine Learning for Any Size of Data, Any Type of DataMachine Learning for Any Size of Data, Any Type of Data
Machine Learning for Any Size of Data, Any Type of Data
DataWorks Summit/Hadoop Summit
 
Kafka Security
Kafka SecurityKafka Security
Kafka Security
DataWorks Summit/Hadoop Summit
 
YARN Federation
YARN Federation YARN Federation
YARN Federation
DataWorks Summit/Hadoop Summit
 
Real-Time Analytics with Apache Cassandra and Apache Spark
Real-Time Analytics with Apache Cassandra and Apache SparkReal-Time Analytics with Apache Cassandra and Apache Spark
Real-Time Analytics with Apache Cassandra and Apache Spark
Guido Schmutz
 
Workload Automation + Hadoop?
Workload Automation + Hadoop?Workload Automation + Hadoop?
Workload Automation + Hadoop?
DataWorks Summit/Hadoop Summit
 
Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4
Chris Nauroth
 
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
DataWorks Summit/Hadoop Summit
 
Building a Data Pipeline from Scratch - Joe Crobak
Building a Data Pipeline from Scratch - Joe CrobakBuilding a Data Pipeline from Scratch - Joe Crobak
Building a Data Pipeline from Scratch - Joe Crobak
Hakka Labs
 
Messaging Architectures with NoSQL Databases as Message Stores
Messaging Architectures with NoSQL Databases as Message StoresMessaging Architectures with NoSQL Databases as Message Stores
Messaging Architectures with NoSQL Databases as Message Stores
Srini Penchikala
 
Hadoop Crash Course Hadoop Summit SJ
Hadoop Crash Course Hadoop Summit SJ Hadoop Crash Course Hadoop Summit SJ
Hadoop Crash Course Hadoop Summit SJ
Daniel Madrigal
 
Autoscaling Spark on AWS EC2 - 11th Spark London meetup
Autoscaling Spark on AWS EC2 - 11th Spark London meetupAutoscaling Spark on AWS EC2 - 11th Spark London meetup
Autoscaling Spark on AWS EC2 - 11th Spark London meetup
Rafal Kwasny
 
From Batch to Realtime with Hadoop - Berlin Buzzwords - June 2012
From Batch to Realtime with Hadoop - Berlin Buzzwords - June 2012From Batch to Realtime with Hadoop - Berlin Buzzwords - June 2012
From Batch to Realtime with Hadoop - Berlin Buzzwords - June 2012
larsgeorge
 
Operating and Supporting Apache HBase Best Practices and Improvements
Operating and Supporting Apache HBase Best Practices and ImprovementsOperating and Supporting Apache HBase Best Practices and Improvements
Operating and Supporting Apache HBase Best Practices and Improvements
DataWorks Summit/Hadoop Summit
 
Machine Learning for Any Size of Data, Any Type of Data
Machine Learning for Any Size of Data, Any Type of DataMachine Learning for Any Size of Data, Any Type of Data
Machine Learning for Any Size of Data, Any Type of Data
DataWorks Summit/Hadoop Summit
 
Real-Time Analytics with Apache Cassandra and Apache Spark
Real-Time Analytics with Apache Cassandra and Apache SparkReal-Time Analytics with Apache Cassandra and Apache Spark
Real-Time Analytics with Apache Cassandra and Apache Spark
Guido Schmutz
 
Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4
Chris Nauroth
 
Ad

Similar to Real-time Hadoop: The Ideal Messaging System for Hadoop (20)

Keys for Success from Streams to Queries
Keys for Success from Streams to QueriesKeys for Success from Streams to Queries
Keys for Success from Streams to Queries
DataWorks Summit/Hadoop Summit
 
Real time-hadoop
Real time-hadoopReal time-hadoop
Real time-hadoop
Ted Dunning
 
CEP - simplified streaming architecture - Strata Singapore 2016
CEP - simplified streaming architecture - Strata Singapore 2016CEP - simplified streaming architecture - Strata Singapore 2016
CEP - simplified streaming architecture - Strata Singapore 2016
Mathieu Dumoulin
 
Evolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and RainEvolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and Rain
MapR Technologies
 
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Mathieu Dumoulin
 
Streaming in the Extreme
Streaming in the ExtremeStreaming in the Extreme
Streaming in the Extreme
Julius Remigio, CBIP
 
Building HBase Applications - Ted Dunning
Building HBase Applications - Ted DunningBuilding HBase Applications - Ted Dunning
Building HBase Applications - Ted Dunning
MapR Technologies
 
HUG_Ireland_Streaming_Ted_Dunning
HUG_Ireland_Streaming_Ted_DunningHUG_Ireland_Streaming_Ted_Dunning
HUG_Ireland_Streaming_Ted_Dunning
John Mulhall
 
Streaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningStreaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine Learning
Ted Dunning
 
MapR 5.2: Getting More Value from the MapR Converged Data Platform
MapR 5.2: Getting More Value from the MapR Converged Data PlatformMapR 5.2: Getting More Value from the MapR Converged Data Platform
MapR 5.2: Getting More Value from the MapR Converged Data Platform
MapR Technologies
 
MapR on Azure: Getting Value from Big Data in the Cloud -
MapR on Azure: Getting Value from Big Data in the Cloud -MapR on Azure: Getting Value from Big Data in the Cloud -
MapR on Azure: Getting Value from Big Data in the Cloud -
MapR Technologies
 
How the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownHow the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside Down
Ted Dunning
 
Dealing with an Upside Down Internet With High Performance Time Series Database
Dealing with an Upside Down Internet  With High Performance Time Series DatabaseDealing with an Upside Down Internet  With High Performance Time Series Database
Dealing with an Upside Down Internet With High Performance Time Series Database
DataWorks Summit
 
Dunning time-series-2015
Dunning time-series-2015Dunning time-series-2015
Dunning time-series-2015
Ted Dunning
 
Is Spark Replacing Hadoop
Is Spark Replacing HadoopIs Spark Replacing Hadoop
Is Spark Replacing Hadoop
MapR Technologies
 
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopApache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Hortonworks
 
Hadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapRHadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapR
Data Con LA
 
Next Generation Enterprise Architecture
Next Generation Enterprise ArchitectureNext Generation Enterprise Architecture
Next Generation Enterprise Architecture
MapR Technologies
 
Back to School - St. Louis Hadoop Meetup September 2016
Back to School - St. Louis Hadoop Meetup September 2016Back to School - St. Louis Hadoop Meetup September 2016
Back to School - St. Louis Hadoop Meetup September 2016
Adam Doyle
 
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...
Chris Fregly
 
Real time-hadoop
Real time-hadoopReal time-hadoop
Real time-hadoop
Ted Dunning
 
CEP - simplified streaming architecture - Strata Singapore 2016
CEP - simplified streaming architecture - Strata Singapore 2016CEP - simplified streaming architecture - Strata Singapore 2016
CEP - simplified streaming architecture - Strata Singapore 2016
Mathieu Dumoulin
 
Evolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and RainEvolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and Rain
MapR Technologies
 
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Mathieu Dumoulin
 
Building HBase Applications - Ted Dunning
Building HBase Applications - Ted DunningBuilding HBase Applications - Ted Dunning
Building HBase Applications - Ted Dunning
MapR Technologies
 
HUG_Ireland_Streaming_Ted_Dunning
HUG_Ireland_Streaming_Ted_DunningHUG_Ireland_Streaming_Ted_Dunning
HUG_Ireland_Streaming_Ted_Dunning
John Mulhall
 
Streaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningStreaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine Learning
Ted Dunning
 
MapR 5.2: Getting More Value from the MapR Converged Data Platform
MapR 5.2: Getting More Value from the MapR Converged Data PlatformMapR 5.2: Getting More Value from the MapR Converged Data Platform
MapR 5.2: Getting More Value from the MapR Converged Data Platform
MapR Technologies
 
MapR on Azure: Getting Value from Big Data in the Cloud -
MapR on Azure: Getting Value from Big Data in the Cloud -MapR on Azure: Getting Value from Big Data in the Cloud -
MapR on Azure: Getting Value from Big Data in the Cloud -
MapR Technologies
 
How the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownHow the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside Down
Ted Dunning
 
Dealing with an Upside Down Internet With High Performance Time Series Database
Dealing with an Upside Down Internet  With High Performance Time Series DatabaseDealing with an Upside Down Internet  With High Performance Time Series Database
Dealing with an Upside Down Internet With High Performance Time Series Database
DataWorks Summit
 
Dunning time-series-2015
Dunning time-series-2015Dunning time-series-2015
Dunning time-series-2015
Ted Dunning
 
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopApache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Hortonworks
 
Hadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapRHadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapR
Data Con LA
 
Next Generation Enterprise Architecture
Next Generation Enterprise ArchitectureNext Generation Enterprise Architecture
Next Generation Enterprise Architecture
MapR Technologies
 
Back to School - St. Louis Hadoop Meetup September 2016
Back to School - St. Louis Hadoop Meetup September 2016Back to School - St. Louis Hadoop Meetup September 2016
Back to School - St. Louis Hadoop Meetup September 2016
Adam Doyle
 
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...
Chris Fregly
 
Ad

More from DataWorks Summit/Hadoop Summit (20)

Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
DataWorks Summit/Hadoop Summit
 
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
DataWorks Summit/Hadoop Summit
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
DataWorks Summit/Hadoop Summit
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
DataWorks Summit/Hadoop Summit
 
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinRevolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
DataWorks Summit/Hadoop Summit
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
DataWorks Summit/Hadoop Summit
 
Hadoop Crash Course
Hadoop Crash CourseHadoop Crash Course
Hadoop Crash Course
DataWorks Summit/Hadoop Summit
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
DataWorks Summit/Hadoop Summit
 
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
DataWorks Summit/Hadoop Summit
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
DataWorks Summit/Hadoop Summit
 
Schema Registry - Set you Data Free
Schema Registry - Set you Data FreeSchema Registry - Set you Data Free
Schema Registry - Set you Data Free
DataWorks Summit/Hadoop Summit
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
DataWorks Summit/Hadoop Summit
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
DataWorks Summit/Hadoop Summit
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
DataWorks Summit/Hadoop Summit
 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
DataWorks Summit/Hadoop Summit
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
DataWorks Summit/Hadoop Summit
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
DataWorks Summit/Hadoop Summit
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
DataWorks Summit/Hadoop Summit
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
DataWorks Summit/Hadoop Summit
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
DataWorks Summit/Hadoop Summit
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
DataWorks Summit/Hadoop Summit
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
DataWorks Summit/Hadoop Summit
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
DataWorks Summit/Hadoop Summit
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
DataWorks Summit/Hadoop Summit
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
DataWorks Summit/Hadoop Summit
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
DataWorks Summit/Hadoop Summit
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
DataWorks Summit/Hadoop Summit
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
DataWorks Summit/Hadoop Summit
 

Recently uploaded (20)

The Future of Cisco Cloud Security: Innovations and AI Integration
The Future of Cisco Cloud Security: Innovations and AI IntegrationThe Future of Cisco Cloud Security: Innovations and AI Integration
The Future of Cisco Cloud Security: Innovations and AI Integration
Re-solution Data Ltd
 
On-Device or Remote? On the Energy Efficiency of Fetching LLM-Generated Conte...
On-Device or Remote? On the Energy Efficiency of Fetching LLM-Generated Conte...On-Device or Remote? On the Energy Efficiency of Fetching LLM-Generated Conte...
On-Device or Remote? On the Energy Efficiency of Fetching LLM-Generated Conte...
Ivano Malavolta
 
Bepents tech services - a premier cybersecurity consulting firm
Bepents tech services - a premier cybersecurity consulting firmBepents tech services - a premier cybersecurity consulting firm
Bepents tech services - a premier cybersecurity consulting firm
Benard76
 
Com fer un pla de gestió de dades amb l'eiNa DMP (en anglès)
Com fer un pla de gestió de dades amb l'eiNa DMP (en anglès)Com fer un pla de gestió de dades amb l'eiNa DMP (en anglès)
Com fer un pla de gestió de dades amb l'eiNa DMP (en anglès)
CSUC - Consorci de Serveis Universitaris de Catalunya
 
UiPath Automation Suite – Cas d'usage d'une NGO internationale basée à Genève
UiPath Automation Suite – Cas d'usage d'une NGO internationale basée à GenèveUiPath Automation Suite – Cas d'usage d'une NGO internationale basée à Genève
UiPath Automation Suite – Cas d'usage d'une NGO internationale basée à Genève
UiPathCommunity
 
Build With AI - In Person Session Slides.pdf
Build With AI - In Person Session Slides.pdfBuild With AI - In Person Session Slides.pdf
Build With AI - In Person Session Slides.pdf
Google Developer Group - Harare
 
Webinar - Top 5 Backup Mistakes MSPs and Businesses Make .pptx
Webinar - Top 5 Backup Mistakes MSPs and Businesses Make   .pptxWebinar - Top 5 Backup Mistakes MSPs and Businesses Make   .pptx
Webinar - Top 5 Backup Mistakes MSPs and Businesses Make .pptx
MSP360
 
IT484 Cyber Forensics_Information Technology
IT484 Cyber Forensics_Information TechnologyIT484 Cyber Forensics_Information Technology
IT484 Cyber Forensics_Information Technology
SHEHABALYAMANI
 
Kit-Works Team Study_아직도 Dockefile.pdf_김성호
Kit-Works Team Study_아직도 Dockefile.pdf_김성호Kit-Works Team Study_아직도 Dockefile.pdf_김성호
Kit-Works Team Study_아직도 Dockefile.pdf_김성호
Wonjun Hwang
 
UiPath Agentic Automation: Community Developer Opportunities
UiPath Agentic Automation: Community Developer OpportunitiesUiPath Agentic Automation: Community Developer Opportunities
UiPath Agentic Automation: Community Developer Opportunities
DianaGray10
 
The Changing Compliance Landscape in 2025.pdf
The Changing Compliance Landscape in 2025.pdfThe Changing Compliance Landscape in 2025.pdf
The Changing Compliance Landscape in 2025.pdf
Precisely
 
Design pattern talk by Kaya Weers - 2025 (v2)
Design pattern talk by Kaya Weers - 2025 (v2)Design pattern talk by Kaya Weers - 2025 (v2)
Design pattern talk by Kaya Weers - 2025 (v2)
Kaya Weers
 
AI Agents at Work: UiPath, Maestro & the Future of Documents
AI Agents at Work: UiPath, Maestro & the Future of DocumentsAI Agents at Work: UiPath, Maestro & the Future of Documents
AI Agents at Work: UiPath, Maestro & the Future of Documents
UiPathCommunity
 
Kit-Works Team Study_팀스터디_김한솔_nuqs_20250509.pdf
Kit-Works Team Study_팀스터디_김한솔_nuqs_20250509.pdfKit-Works Team Study_팀스터디_김한솔_nuqs_20250509.pdf
Kit-Works Team Study_팀스터디_김한솔_nuqs_20250509.pdf
Wonjun Hwang
 
Jignesh Shah - The Innovator and Czar of Exchanges
Jignesh Shah - The Innovator and Czar of ExchangesJignesh Shah - The Innovator and Czar of Exchanges
Jignesh Shah - The Innovator and Czar of Exchanges
Jignesh Shah Innovator
 
Challenges in Migrating Imperative Deep Learning Programs to Graph Execution:...
Challenges in Migrating Imperative Deep Learning Programs to Graph Execution:...Challenges in Migrating Imperative Deep Learning Programs to Graph Execution:...
Challenges in Migrating Imperative Deep Learning Programs to Graph Execution:...
Raffi Khatchadourian
 
Enterprise Integration Is Dead! Long Live AI-Driven Integration with Apache C...
Enterprise Integration Is Dead! Long Live AI-Driven Integration with Apache C...Enterprise Integration Is Dead! Long Live AI-Driven Integration with Apache C...
Enterprise Integration Is Dead! Long Live AI-Driven Integration with Apache C...
Markus Eisele
 
GyrusAI - Broadcasting & Streaming Applications Driven by AI and ML
GyrusAI - Broadcasting & Streaming Applications Driven by AI and MLGyrusAI - Broadcasting & Streaming Applications Driven by AI and ML
GyrusAI - Broadcasting & Streaming Applications Driven by AI and ML
Gyrus AI
 
Unlocking Generative AI in your Web Apps
Unlocking Generative AI in your Web AppsUnlocking Generative AI in your Web Apps
Unlocking Generative AI in your Web Apps
Maximiliano Firtman
 
AI You Can Trust: The Critical Role of Governance and Quality.pdf
AI You Can Trust: The Critical Role of Governance and Quality.pdfAI You Can Trust: The Critical Role of Governance and Quality.pdf
AI You Can Trust: The Critical Role of Governance and Quality.pdf
Precisely
 
The Future of Cisco Cloud Security: Innovations and AI Integration
The Future of Cisco Cloud Security: Innovations and AI IntegrationThe Future of Cisco Cloud Security: Innovations and AI Integration
The Future of Cisco Cloud Security: Innovations and AI Integration
Re-solution Data Ltd
 
On-Device or Remote? On the Energy Efficiency of Fetching LLM-Generated Conte...
On-Device or Remote? On the Energy Efficiency of Fetching LLM-Generated Conte...On-Device or Remote? On the Energy Efficiency of Fetching LLM-Generated Conte...
On-Device or Remote? On the Energy Efficiency of Fetching LLM-Generated Conte...
Ivano Malavolta
 
Bepents tech services - a premier cybersecurity consulting firm
Bepents tech services - a premier cybersecurity consulting firmBepents tech services - a premier cybersecurity consulting firm
Bepents tech services - a premier cybersecurity consulting firm
Benard76
 
UiPath Automation Suite – Cas d'usage d'une NGO internationale basée à Genève
UiPath Automation Suite – Cas d'usage d'une NGO internationale basée à GenèveUiPath Automation Suite – Cas d'usage d'une NGO internationale basée à Genève
UiPath Automation Suite – Cas d'usage d'une NGO internationale basée à Genève
UiPathCommunity
 
Webinar - Top 5 Backup Mistakes MSPs and Businesses Make .pptx
Webinar - Top 5 Backup Mistakes MSPs and Businesses Make   .pptxWebinar - Top 5 Backup Mistakes MSPs and Businesses Make   .pptx
Webinar - Top 5 Backup Mistakes MSPs and Businesses Make .pptx
MSP360
 
IT484 Cyber Forensics_Information Technology
IT484 Cyber Forensics_Information TechnologyIT484 Cyber Forensics_Information Technology
IT484 Cyber Forensics_Information Technology
SHEHABALYAMANI
 
Kit-Works Team Study_아직도 Dockefile.pdf_김성호
Kit-Works Team Study_아직도 Dockefile.pdf_김성호Kit-Works Team Study_아직도 Dockefile.pdf_김성호
Kit-Works Team Study_아직도 Dockefile.pdf_김성호
Wonjun Hwang
 
UiPath Agentic Automation: Community Developer Opportunities
UiPath Agentic Automation: Community Developer OpportunitiesUiPath Agentic Automation: Community Developer Opportunities
UiPath Agentic Automation: Community Developer Opportunities
DianaGray10
 
The Changing Compliance Landscape in 2025.pdf
The Changing Compliance Landscape in 2025.pdfThe Changing Compliance Landscape in 2025.pdf
The Changing Compliance Landscape in 2025.pdf
Precisely
 
Design pattern talk by Kaya Weers - 2025 (v2)
Design pattern talk by Kaya Weers - 2025 (v2)Design pattern talk by Kaya Weers - 2025 (v2)
Design pattern talk by Kaya Weers - 2025 (v2)
Kaya Weers
 
AI Agents at Work: UiPath, Maestro & the Future of Documents
AI Agents at Work: UiPath, Maestro & the Future of DocumentsAI Agents at Work: UiPath, Maestro & the Future of Documents
AI Agents at Work: UiPath, Maestro & the Future of Documents
UiPathCommunity
 
Kit-Works Team Study_팀스터디_김한솔_nuqs_20250509.pdf
Kit-Works Team Study_팀스터디_김한솔_nuqs_20250509.pdfKit-Works Team Study_팀스터디_김한솔_nuqs_20250509.pdf
Kit-Works Team Study_팀스터디_김한솔_nuqs_20250509.pdf
Wonjun Hwang
 
Jignesh Shah - The Innovator and Czar of Exchanges
Jignesh Shah - The Innovator and Czar of ExchangesJignesh Shah - The Innovator and Czar of Exchanges
Jignesh Shah - The Innovator and Czar of Exchanges
Jignesh Shah Innovator
 
Challenges in Migrating Imperative Deep Learning Programs to Graph Execution:...
Challenges in Migrating Imperative Deep Learning Programs to Graph Execution:...Challenges in Migrating Imperative Deep Learning Programs to Graph Execution:...
Challenges in Migrating Imperative Deep Learning Programs to Graph Execution:...
Raffi Khatchadourian
 
Enterprise Integration Is Dead! Long Live AI-Driven Integration with Apache C...
Enterprise Integration Is Dead! Long Live AI-Driven Integration with Apache C...Enterprise Integration Is Dead! Long Live AI-Driven Integration with Apache C...
Enterprise Integration Is Dead! Long Live AI-Driven Integration with Apache C...
Markus Eisele
 
GyrusAI - Broadcasting & Streaming Applications Driven by AI and ML
GyrusAI - Broadcasting & Streaming Applications Driven by AI and MLGyrusAI - Broadcasting & Streaming Applications Driven by AI and ML
GyrusAI - Broadcasting & Streaming Applications Driven by AI and ML
Gyrus AI
 
Unlocking Generative AI in your Web Apps
Unlocking Generative AI in your Web AppsUnlocking Generative AI in your Web Apps
Unlocking Generative AI in your Web Apps
Maximiliano Firtman
 
AI You Can Trust: The Critical Role of Governance and Quality.pdf
AI You Can Trust: The Critical Role of Governance and Quality.pdfAI You Can Trust: The Critical Role of Governance and Quality.pdf
AI You Can Trust: The Critical Role of Governance and Quality.pdf
Precisely
 

Real-time Hadoop: The Ideal Messaging System for Hadoop

  • 1. © 2016 MapR Technologies 1© 2014 MapR Technologies
  • 2. © 2016 MapR Technologies 2 Contact Information Ted Dunning Chief Applications Architect at MapR Technologies Committer & PMC for Apache’s Drill, Zookeeper & others VP of Incubator at Apache Foundation Email tdunning@apache.org tdunning@maprtech.com Twitter @ted_dunning Hashtags today: #stratahadoop #ojai
  • 3. © 2016 MapR Technologies 3 Streaming Architecture by Ted Dunning and Ellen Friedman © 2016 (published by O’Reilly) Free copies at book signing today 3:40PM @ MapR booth http://bit.ly/mapr-ebook-streams
  • 4. © 2016 MapR Technologies 4 Goals • Real-time or near-time – Includes situations with deadlines – Also includes situations where delay is simply undesirable – Even includes situations where delay is just fine • Micro-services – Streaming is a convenient idiom for design – Micro-services … you know we wanted it – Service isolation is a key requirement
  • 5. © 2016 MapR Technologies 5 Real-time or Near-time? • The real point is flow versus state (see talk later today) • One consequence of flow-based computing is real-time and near-time become relatively easy • Life may be a bitch, but it doesn’t happen in batches!
  • 6. © 2016 MapR Technologies 6 Agenda • Background / micro-services • Global requirements • Scale
  • 7. © 2016 MapR Technologies 7 A microservice is loosely coupled with bounded context
  • 8. © 2016 MapR Technologies 8 How to Couple Services and Break micro-ness • Shared schemas, relational stores • Ad hoc communication between services • Enterprise service busses • Brittle protocols • Poor protocol versioning
  • 9. © 2016 MapR Technologies 9 How to Decouple Services • Use self-describing data • Private databases • Infrastructural communication between services • Use modern protocols • Adopt future-proof protocol practices • Use shared storage where necessary due to scale
  • 10. © 2016 MapR Technologies 11 What is the Right Structure for Flow Compute? • Traditional message queues? – Message queues are classic answer – Key feature/bug is out-of-order acknowledgement – Many implementations – You pay a huge performance hit for persistence • Kafka-esque Logs? – Logs are like queues, but with ordering – Out of order consumption is possible, acknowledgement not so much – Canonical base implementation is Kafka – Performance plus persistence
  • 11. © 2016 MapR Technologies 12 Scenarios Profile Database
  • 12. © 2016 MapR Technologies 13 The task ? POS 1 location, t, card # yes/no? POS 2 location, t, card # yes/no?
  • 13. © 2016 MapR Technologies 14 Traditional Solution POS 1..n Fraud detector Last card use
  • 14. © 2016 MapR Technologies 15 What Happens Next? POS 1..n Fraud detector Last card use POS 1..n Fraud detector POS 1..n Fraud detector
  • 15. © 2016 MapR Technologies 16 What Happens Next? POS 1..n Fraud detector Last card use POS 1..n Fraud detector POS 1..n Fraud detector
  • 16. © 2016 MapR Technologies 17 How to Get Service Isolation POS 1..n Fraud detector Last card use Updater card activity
  • 17. © 2016 MapR Technologies 18 New Uses of Data POS 1..n Fraud detector Last card use Updater Card location history Other card activity
  • 18. © 2016 MapR Technologies 19 Scaling Through Isolation POS 1..n Last card use Updater POS 1..n Last card use Updater card activity Fraud detector Fraud detector
  • 19. © 2016 MapR Technologies 20 Lessons • De-coupling and isolation are key • Private data stores/tables are important, – but local storage of private data is a bug • Propagate events, not table updates
  • 20. © 2016 MapR Technologies 21 Scenarios IoT Data Aggregation
  • 21. © 2016 MapR Technologies 22 Basic Situation Each location has many pumps pump data Multiple locations
  • 22. © 2016 MapR Technologies 23 What Does a Pump Look Like inlet out let m ot or Temperature Pressure Flow Temperature Pressure Flow Winding temperature Voltage Current
  • 23. © 2016 MapR Technologies 24 Basic Situation Each location has many pumps pump data Multiple locations
  • 24. © 2016 MapR Technologies 25 pump data pump data pump data pump data Basic Architecture Reflects Business Structure
  • 25. © 2016 MapR Technologies 26 Lessons • Data architecture should reflect business structure • Even very modest designs involve multiple data centers • Schemas cannot be frozen in the real world • Security must follow data ownership
  • 26. © 2016 MapR Technologies 27 Scenarios Global Data Recovery
  • 27. © 2016 MapR Technologies 28 Tokyo Corporate HQ
  • 28. © 2016 MapR Technologies 29 Singapore Tokyo Corporate HQ
  • 29. © 2016 MapR Technologies 30 Singapore Tokyo Corporate HQ
  • 30. © 2016 MapR Technologies 31 Singapore Tokyo Corporate HQ
  • 31. © 2016 MapR Technologies 32 Lessons • Arbitrary number of topics important for simplicity + performance • Updates happen in many places • Mobility implies change in replication patterns • Multi-master updates simplify design massively
  • 32. © 2016 MapR Technologies 33 Converged Requirements
  • 33. © 2016 MapR Technologies 34 What Have We Learned? • Need persistence and performance – Possibly for years and to 100’s of millions t/s • Must have convergence – Need files, tables AND streams – Need volumes, snapshots, mirrors, permissions and … • Must have platform security – Cannot depend on perimeter – Must follow business structure • Must have global scale and scope – Millions of topics for natural designs – Multi-master replication and update
  • 34. © 2016 MapR Technologies 35 The Importance of Common API’s • Commonality and interoperability are critical – Compare Hadoop eco-system and the noSQL world • Table stakes – Persistence – Performance – Polymorphism • Major trend so far is to adopt Kafka API – 0.9 API and beyond remove major abstraction leaks – Kafka API supported by all major Hadoop vendors
  • 35. © 2016 MapR Technologies 36 What we do
  • 36. © 2016 MapR Technologies 37 Evolution of Data Storage Functionality Compatibility Scalability Linux POSIX Over decades of progress, Unix-based systems have set the standard for compatibility and functionality
  • 37. © 2016 MapR Technologies 38 Functionality Compatibility Scalability Linux POSIX Hadoop Hadoop achieves much higher scalability by trading away essentially all of this compatibility Evolution of Data Storage
  • 38. © 2016 MapR Technologies 39 Evolution of Data Storage Functionality Compatibility Scalability Linux POSIX Hadoop MapR enhanced Apache Hadoop by restoring the compatibility while increasing scalability and performance Functionality Compatibility Scalability POSIX
  • 39. © 2016 MapR Technologies 40 Functionality Compatibility Scalability Linux POSIX Hadoop Evolution of Data Storage Adding tables and streams enhances the functionality of the base file system
  • 40. © 2016 MapR Technologies 41 http://bit.ly/fastest-big-data
  • 41. © 2016 MapR Technologies 42 How we do this with MapR • MapR Streams is a C++ reimplementation of Kafka API – Advantages in predictability, performance, scale – Common security and permissions with entire MapR converged data platform • Semantic extensions – A cluster contains volumes, files, tables … and now streams – Streams contain topics – Can have default stream or can name stream by path name • Core MapR capabilities preserved – Consistent snapshots, mirrors, multi-master replication
  • 42. © 2016 MapR Technologies 43 MapR original Innovations • Volumes – Distributed management – Data placement • Read/write random access file system – Allows distributed meta-data – Improved scaling – Enables NFS access • Application-level NIC bonding • Transactionally correct snapshots and mirrors
  • 43. © 2016 MapR Technologies 44 MapR's Containers  Each container contains  Directories & files  Data blocks  Replicated on servers  No need to manage directly Files/directories are sharded into blocks, which are placed into containers on disks Containers are 16- 32 GB segments of disk, placed on nodes
  • 44. © 2016 MapR Technologies 45 MapR's Containers  Each container has a replication chain  Updates are transactional  Failures are handled by rearranging replication
  • 45. © 2016 MapR Technologies 46 Container locations and replication CLDB N1, N2 N3, N2 N1, N2 N1, N3 N3, N2 N1 N2 N3Container location database (CLDB) keeps track of nodes hosting each container and replication chain order
  • 46. © 2016 MapR Technologies 47 MapR Scaling Containers represent 16 - 32GB of data  Each can hold up to 1 Billion files and directories  100M containers = ~ 2 Exabytes (a very large cluster) 250 bytes DRAM to cache a container  25GB to cache all containers for 2EB cluster  But not necessary, can page to disk  Typical large 10PB cluster needs 2GB Container-reports are 100x - 1000x < HDFS block-reports  Serve 100x more data-nodes  Increase container size to 64G to serve 4EB cluster  Map/reduce not affected
  • 47. © 2016 MapR Technologies 48 But Wait, There’s More • Directories and files are implemented in terms of B-trees – Key is offset, value is data blob – Internal transactional semantics guarantees safety and consistency – Layout algorithms give very high layout linearization • Tables are implemented in terms of B-trees – Twisted B-tree implementation allows virtues of log-structured merge tree without the compaction delays – Tablet splitting without pausing, integration with file system transactions • Common security and permissions scheme
  • 48. © 2016 MapR Technologies 49 Table Tablet Partition Similar to LSM implementations, tables are decomposed by key ranges Distinct from HBase and Level DB, MapR tables used fixed number (greater than 1) of decompositions Very unusually, relative to LSM and cousins, data structures at the leaf are mutable
  • 49. © 2016 MapR Technologies 50 Re-use of Proven Technology Partitions are distributed just like file chunks Same replication and transaction technology
  • 50. © 2016 MapR Technologies 51 And More … • Streams are implemented in terms of B-trees as well – Topics and consumer offsets are kept in stream, not ZK – Similar splitting technology as MapR DB tables – Consistent permissions, security, data replication • Standard Kafka 0.9 API • Plans to add OJAI for high-level structuring • Performance is very high
  • 51. © 2016 MapR Technologies 52 Example Files Table Streams Directories Cluster Volume mount point
  • 52. © 2016 MapR Technologies 53 Cluster Volume mount point
  • 53. © 2016 MapR Technologies 54 Lessons • API’s matter more than implementations • There is plenty of room to innovate ahead of the community • Posix, HDFS, HBASE all define useful API’s • Kafka 0.9+ does the same
  • 54. © 2016 MapR Technologies 55 Call to action: Support the common API’s
  • 55. © 2016 MapR Technologies 56 Call to action: Support the Kafka API’s And come by the MapR booth to check out MapR Streams
  • 56. © 2016 MapR Technologies 57
  • 57. © 2016 MapR Technologies 58 Streaming Architecture by Ted Dunning and Ellen Friedman © 2016 (published by O’Reilly) Free copies at book signing today http://bit.ly/mapr-ebook-streams
  • 58. © 2016 MapR Technologies 59 Read online mapr.com/6ebooks-read Download pdfs mapr.com/6ebooks-pdf 6 Free ebooks Streaming Architecture Ted Dunning & Ellen Friedman and MapR Streams Read online mapr.com/6ebooks-read Download pdfs mapr.com/6ebooks-pdf 6 Free ebooks Streaming Architecture Ted Dunning & Ellen Friedman and MapR Streams Read online mapr.com/6ebooks-read Download pdfs mapr.com/6ebooks-pdf 6 Free ebooks Streaming Architecture Ted Dunning & Ellen Friedman and MapR Streams Read online mapr.com/6ebooks-read Download pdfs mapr.com/6ebooks-pdf 6 Free ebooks Streaming Architecture Ted Dunning & Ellen Friedman and MapR Streams Read online mapr.com/6ebooks-read Download pdfs mapr.com/6ebooks-pdf 6 Free ebooks Streaming Architecture Ted Dunning & Ellen Friedman and MapR Streams Read online mapr.com/6ebooks-read Download pdfs mapr.com/6ebooks-pdf 6 Free ebooks Streaming Architecture Ted Dunning & Ellen Friedman and MapR Streams Read online mapr.com/6ebooks-read Download pdfs mapr.com/6ebooks-pdf 6 Free ebooks Streaming Architecture Ted Dunning & Ellen Friedman and MapR Streams Read online mapr.com/6ebooks-read Download pdfs mapr.com/6ebooks-pdf 6 Free ebooks Streaming Architecture Ted Dunning & Ellen Friedman and MapR Streams
  • 59. © 2016 MapR Technologies 60 Thank you for coming today!
  • 60. © 2016 MapR Technologies 61 …helping you put data technology to work ● Find answers ● Ask technical questions ● Join on-demand training course discussions ● Follow release announcements ● Share and vote on product ideas ● Find Meetup and event listings Connect with fellow Apache Hadoop and Spark professionals community.mapr.com
  • 61. © 2016 MapR Technologies 62 Q&A @mapr maprtech tdunning@maprtech.com Engage with us! MapR maprtech mapr-technologies
  翻译: