SlideShare a Scribd company logo
Kafka Streams
vs
Spark Structured Streaming
Modern Stream Processing Engines Compared
© Jacek Laskowski / @JacekLaskowski / jacek@japila.pl / DataMass Summit 2019
● A freelance IT consultant
● Specializing in Spark, Kafka, Kafka Streams, Scala
● Development | Consulting | Training
● "The Internals Of" online books
● Among contributors to Apache Spark
● Among Confluent Community Catalyst (Class of 2019 - 2020)
● Contact me at jacek@japila.pl
● Follow @JacekLaskowski on twitter for more #ApacheSpark
#ApacheKafka #KafkaStreams
Jacek Laskowski
Friendly reminder
Pictures...take a lot of pictures! 📷
© Jacek Laskowski / @JacekLaskowski / jacek@japila.pl
The Features of Both
1. Stream Processing Engines
2. High-Level DSL for defining processing flow (logic)
a. Topology
b. Dataflow
3. Low-Level API for custom flows
4. Logical and physical plans
a. Logical “what” and executable “how”
© Jacek Laskowski / @JacekLaskowski / jacek@japila.pl
Kafka Streams (1 of 2)
1. Kafka Streams 2.3.0
2. Java and Scala APIs
3. Yet Another Command-Line Application (“YACA”)
a. High-availability and fault tolerance OOTB
b. Creating consumer groups OOTB
4. Support for Apache Kafka only
a. Use Kafka Connect to go beyond Kafka
5. Data Abstractions
a. High-level Streams DSL (KStream, KTable, KGlobalTable)
b. Low-level Processor API
6. One record at a time
© Jacek Laskowski / @JacekLaskowski / jacek@japila.pl
Kafka Streams (2 of 2)
1. ETL only
a. No support for SQL or Machine Learning
b. KSQL
2. Java 11 supported
3. Scala 2.12
4. No interactive shell / REPL for learning and prototyping
5. Rich join support (stream-stream, stream-table,
stream-global table)
6. Reading from and writing to a single Kafka cluster
7. Uses RocksDB for persistent state storage
© Jacek Laskowski / @JacekLaskowski / jacek@japila.pl
Kafka Streams Code / Topology (1 of 2)
© Jacek Laskowski / @JacekLaskowski / jacek@japila.pl
Kafka Streams Code / Execution Env (1 of 2)
© Jacek Laskowski / @JacekLaskowski / jacek@japila.pl
Spark Structured Streaming (1 of 2)
1. Apache Spark 2.4.4
2. Stream Processing API for Scala, Java, Python, SQL
a. Useful for software developers and data scientists
3. Requires cluster manager
a. Apache Hadoop’s YARN / Apache Mesos / DC/OS / Spark
Standalone
4. Lots of data sources
a. Kafka, JSON, parquet, CSV, Avro, ORC, socket
b. Data Source API
5. Data abstraction: streaming Dataset
© Jacek Laskowski / @JacekLaskowski / jacek@japila.pl
Spark Structured Streaming (2 of 2)
1. ETL + Machine Learning
a. Spark MLlib supports streaming Datasets
2. Java 8 only
3. Scala 2.12
4. spark-shell for learning and prototyping
5. Streaming joins and aggregations
6. Reading from one Kafka cluster and writing to another
Kafka cluster
7. Uses Hadoop DFS (HDFS) for checkpointing and
persistent state storage
© Jacek Laskowski / @JacekLaskowski / jacek@japila.pl
Spark “Streams” Code / Loading Data (1 of 3)
© Jacek Laskowski / @JacekLaskowski / jacek@japila.pl
Spark “Streams” Code / Processing (2 of 3)
© Jacek Laskowski / @JacekLaskowski / jacek@japila.pl
Spark “Streams” Code / Saving Data (3 of 3)
© Jacek Laskowski / @JacekLaskowski / jacek@japila.pl
“The Internals Of” Online Books
1. The Internals of Kafka Streams
2. The Internals of Spark Structured Streaming
Questions?
1. Follow @jaceklaskowski on twitter (DMs open)
2. Upvote my questions and answers on StackOverflow
3. Contact me at jacek@japila.pl
4. Connect with me at LinkedIn
© Jacek Laskowski / @JacekLaskowski / jacek@japila.pl
Ad

More Related Content

What's hot (20)

Writing Continuous Applications with Structured Streaming PySpark API
Writing Continuous Applications with Structured Streaming PySpark APIWriting Continuous Applications with Structured Streaming PySpark API
Writing Continuous Applications with Structured Streaming PySpark API
Databricks
 
Spark Summit San Francisco 2016 - Matei Zaharia Keynote: Apache Spark 2.0
Spark Summit San Francisco 2016 - Matei Zaharia Keynote: Apache Spark 2.0Spark Summit San Francisco 2016 - Matei Zaharia Keynote: Apache Spark 2.0
Spark Summit San Francisco 2016 - Matei Zaharia Keynote: Apache Spark 2.0
Databricks
 
Whirlpools in the Stream with Jayesh Lalwani
 Whirlpools in the Stream with Jayesh Lalwani Whirlpools in the Stream with Jayesh Lalwani
Whirlpools in the Stream with Jayesh Lalwani
Databricks
 
Zeppelin at twitter (sf data science meetup, july 2016)
Zeppelin at twitter (sf data science meetup, july 2016)Zeppelin at twitter (sf data science meetup, july 2016)
Zeppelin at twitter (sf data science meetup, july 2016)
Prasad Wagle
 
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming
Building Realtime Data Pipelines with Kafka Connect and Spark StreamingBuilding Realtime Data Pipelines with Kafka Connect and Spark Streaming
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming
Jen Aman
 
Keynote at spark summit east anjul
Keynote at spark summit east anjulKeynote at spark summit east anjul
Keynote at spark summit east anjul
Anjul Bhambhri
 
Extending Spark SQL 2.4 with New Data Sources (Live Coding Session)
Extending Spark SQL 2.4 with New Data Sources (Live Coding Session)Extending Spark SQL 2.4 with New Data Sources (Live Coding Session)
Extending Spark SQL 2.4 with New Data Sources (Live Coding Session)
Databricks
 
Tangram: Distributed Scheduling Framework for Apache Spark at Facebook
Tangram: Distributed Scheduling Framework for Apache Spark at FacebookTangram: Distributed Scheduling Framework for Apache Spark at Facebook
Tangram: Distributed Scheduling Framework for Apache Spark at Facebook
Databricks
 
Efficient State Management With Spark 2.0 And Scale-Out Databases
Efficient State Management With Spark 2.0 And Scale-Out DatabasesEfficient State Management With Spark 2.0 And Scale-Out Databases
Efficient State Management With Spark 2.0 And Scale-Out Databases
Jen Aman
 
Databricks with R: Deep Dive
Databricks with R: Deep DiveDatabricks with R: Deep Dive
Databricks with R: Deep Dive
Databricks
 
Interoperating a Zoo of Data Processing Platforms Using with Rheem Sebastian ...
Interoperating a Zoo of Data Processing Platforms Using with Rheem Sebastian ...Interoperating a Zoo of Data Processing Platforms Using with Rheem Sebastian ...
Interoperating a Zoo of Data Processing Platforms Using with Rheem Sebastian ...
Databricks
 
Spark Summit EU talk by Stephan Kessler
Spark Summit EU talk by Stephan KesslerSpark Summit EU talk by Stephan Kessler
Spark Summit EU talk by Stephan Kessler
Spark Summit
 
Future of data visualization
Future of data visualizationFuture of data visualization
Future of data visualization
hadoopsphere
 
What to Expect for Big Data and Apache Spark in 2017
What to Expect for Big Data and Apache Spark in 2017 What to Expect for Big Data and Apache Spark in 2017
What to Expect for Big Data and Apache Spark in 2017
Databricks
 
SparkR + Zeppelin
SparkR + ZeppelinSparkR + Zeppelin
SparkR + Zeppelin
felixcss
 
Galene - LinkedIn's Search Architecture: Presented by Diego Buthay & Sriram S...
Galene - LinkedIn's Search Architecture: Presented by Diego Buthay & Sriram S...Galene - LinkedIn's Search Architecture: Presented by Diego Buthay & Sriram S...
Galene - LinkedIn's Search Architecture: Presented by Diego Buthay & Sriram S...
Lucidworks
 
Automatic Query-Centric API for Routine Access to Linked Data
Automatic Query-Centric API for Routine Access to Linked DataAutomatic Query-Centric API for Routine Access to Linked Data
Automatic Query-Centric API for Routine Access to Linked Data
Albert Meroño-Peñuela
 
An Introduction to Sparkling Water by Michal Malohlava
An Introduction to Sparkling Water by Michal MalohlavaAn Introduction to Sparkling Water by Michal Malohlava
An Introduction to Sparkling Water by Michal Malohlava
Spark Summit
 
What's New in Spark 2?
What's New in Spark 2?What's New in Spark 2?
What's New in Spark 2?
Eyal Ben Ivri
 
Performance of Spark vs MapReduce
Performance of Spark vs MapReducePerformance of Spark vs MapReduce
Performance of Spark vs MapReduce
Edureka!
 
Writing Continuous Applications with Structured Streaming PySpark API
Writing Continuous Applications with Structured Streaming PySpark APIWriting Continuous Applications with Structured Streaming PySpark API
Writing Continuous Applications with Structured Streaming PySpark API
Databricks
 
Spark Summit San Francisco 2016 - Matei Zaharia Keynote: Apache Spark 2.0
Spark Summit San Francisco 2016 - Matei Zaharia Keynote: Apache Spark 2.0Spark Summit San Francisco 2016 - Matei Zaharia Keynote: Apache Spark 2.0
Spark Summit San Francisco 2016 - Matei Zaharia Keynote: Apache Spark 2.0
Databricks
 
Whirlpools in the Stream with Jayesh Lalwani
 Whirlpools in the Stream with Jayesh Lalwani Whirlpools in the Stream with Jayesh Lalwani
Whirlpools in the Stream with Jayesh Lalwani
Databricks
 
Zeppelin at twitter (sf data science meetup, july 2016)
Zeppelin at twitter (sf data science meetup, july 2016)Zeppelin at twitter (sf data science meetup, july 2016)
Zeppelin at twitter (sf data science meetup, july 2016)
Prasad Wagle
 
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming
Building Realtime Data Pipelines with Kafka Connect and Spark StreamingBuilding Realtime Data Pipelines with Kafka Connect and Spark Streaming
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming
Jen Aman
 
Keynote at spark summit east anjul
Keynote at spark summit east anjulKeynote at spark summit east anjul
Keynote at spark summit east anjul
Anjul Bhambhri
 
Extending Spark SQL 2.4 with New Data Sources (Live Coding Session)
Extending Spark SQL 2.4 with New Data Sources (Live Coding Session)Extending Spark SQL 2.4 with New Data Sources (Live Coding Session)
Extending Spark SQL 2.4 with New Data Sources (Live Coding Session)
Databricks
 
Tangram: Distributed Scheduling Framework for Apache Spark at Facebook
Tangram: Distributed Scheduling Framework for Apache Spark at FacebookTangram: Distributed Scheduling Framework for Apache Spark at Facebook
Tangram: Distributed Scheduling Framework for Apache Spark at Facebook
Databricks
 
Efficient State Management With Spark 2.0 And Scale-Out Databases
Efficient State Management With Spark 2.0 And Scale-Out DatabasesEfficient State Management With Spark 2.0 And Scale-Out Databases
Efficient State Management With Spark 2.0 And Scale-Out Databases
Jen Aman
 
Databricks with R: Deep Dive
Databricks with R: Deep DiveDatabricks with R: Deep Dive
Databricks with R: Deep Dive
Databricks
 
Interoperating a Zoo of Data Processing Platforms Using with Rheem Sebastian ...
Interoperating a Zoo of Data Processing Platforms Using with Rheem Sebastian ...Interoperating a Zoo of Data Processing Platforms Using with Rheem Sebastian ...
Interoperating a Zoo of Data Processing Platforms Using with Rheem Sebastian ...
Databricks
 
Spark Summit EU talk by Stephan Kessler
Spark Summit EU talk by Stephan KesslerSpark Summit EU talk by Stephan Kessler
Spark Summit EU talk by Stephan Kessler
Spark Summit
 
Future of data visualization
Future of data visualizationFuture of data visualization
Future of data visualization
hadoopsphere
 
What to Expect for Big Data and Apache Spark in 2017
What to Expect for Big Data and Apache Spark in 2017 What to Expect for Big Data and Apache Spark in 2017
What to Expect for Big Data and Apache Spark in 2017
Databricks
 
SparkR + Zeppelin
SparkR + ZeppelinSparkR + Zeppelin
SparkR + Zeppelin
felixcss
 
Galene - LinkedIn's Search Architecture: Presented by Diego Buthay & Sriram S...
Galene - LinkedIn's Search Architecture: Presented by Diego Buthay & Sriram S...Galene - LinkedIn's Search Architecture: Presented by Diego Buthay & Sriram S...
Galene - LinkedIn's Search Architecture: Presented by Diego Buthay & Sriram S...
Lucidworks
 
Automatic Query-Centric API for Routine Access to Linked Data
Automatic Query-Centric API for Routine Access to Linked DataAutomatic Query-Centric API for Routine Access to Linked Data
Automatic Query-Centric API for Routine Access to Linked Data
Albert Meroño-Peñuela
 
An Introduction to Sparkling Water by Michal Malohlava
An Introduction to Sparkling Water by Michal MalohlavaAn Introduction to Sparkling Water by Michal Malohlava
An Introduction to Sparkling Water by Michal Malohlava
Spark Summit
 
What's New in Spark 2?
What's New in Spark 2?What's New in Spark 2?
What's New in Spark 2?
Eyal Ben Ivri
 
Performance of Spark vs MapReduce
Performance of Spark vs MapReducePerformance of Spark vs MapReduce
Performance of Spark vs MapReduce
Edureka!
 

Similar to  Kafka Streams VS Spark Structured Streaming - Modern Stream Processing Engines Compared (20)

Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Helena Edelson
 
Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark Overview
Dharmjit Singh
 
Spark streaming state of the union
Spark streaming state of the unionSpark streaming state of the union
Spark streaming state of the union
Databricks
 
실시간 Streaming using Spark and Kafka 강의교재
실시간 Streaming using Spark and Kafka 강의교재실시간 Streaming using Spark and Kafka 강의교재
실시간 Streaming using Spark and Kafka 강의교재
hkyoon2
 
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Helena Edelson
 
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
Simplilearn
 
Apache Spark vs Apache Flink
Apache Spark vs Apache FlinkApache Spark vs Apache Flink
Apache Spark vs Apache Flink
AKASH SIHAG
 
Deep Dive Into Kafka Streams (and the Distributed Stream Processing Engine) (...
Deep Dive Into Kafka Streams (and the Distributed Stream Processing Engine) (...Deep Dive Into Kafka Streams (and the Distributed Stream Processing Engine) (...
Deep Dive Into Kafka Streams (and the Distributed Stream Processing Engine) (...
confluent
 
Edbt19 paper 329
Edbt19 paper 329Edbt19 paper 329
Edbt19 paper 329
LUIS ALBEIRO GIRALDO BETANCOURTH
 
Apache Spark Streaming
Apache Spark StreamingApache Spark Streaming
Apache Spark Streaming
Bartosz Jankiewicz
 
Introduction to Kafka Streams Presentation
Introduction to Kafka Streams PresentationIntroduction to Kafka Streams Presentation
Introduction to Kafka Streams Presentation
Knoldus Inc.
 
What is apache Kafka?
What is apache Kafka?What is apache Kafka?
What is apache Kafka?
Kenny Gorman
 
What is Apache Kafka®?
What is Apache Kafka®?What is Apache Kafka®?
What is Apache Kafka®?
Eventador
 
Apache Kafka 0.11 の Exactly Once Semantics
Apache Kafka 0.11 の Exactly Once SemanticsApache Kafka 0.11 の Exactly Once Semantics
Apache Kafka 0.11 の Exactly Once Semantics
Yoshiyasu SAEKI
 
Kafka Streams for Java enthusiasts
Kafka Streams for Java enthusiastsKafka Streams for Java enthusiasts
Kafka Streams for Java enthusiasts
Slim Baltagi
 
Spark Streaming and MLlib - Hyderabad Spark Group
Spark Streaming and MLlib - Hyderabad Spark GroupSpark Streaming and MLlib - Hyderabad Spark Group
Spark Streaming and MLlib - Hyderabad Spark Group
Phaneendra Chiruvella
 
Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, St...
Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, St...Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, St...
Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, St...
Michael Noll
 
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Simplilearn
 
Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Car...
Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Car...Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Car...
Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Car...
Codemotion
 
BBL KAPPA Lesfurets.com
BBL KAPPA Lesfurets.comBBL KAPPA Lesfurets.com
BBL KAPPA Lesfurets.com
Cedric Vidal
 
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Helena Edelson
 
Spark streaming state of the union
Spark streaming state of the unionSpark streaming state of the union
Spark streaming state of the union
Databricks
 
실시간 Streaming using Spark and Kafka 강의교재
실시간 Streaming using Spark and Kafka 강의교재실시간 Streaming using Spark and Kafka 강의교재
실시간 Streaming using Spark and Kafka 강의교재
hkyoon2
 
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Helena Edelson
 
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
Simplilearn
 
Apache Spark vs Apache Flink
Apache Spark vs Apache FlinkApache Spark vs Apache Flink
Apache Spark vs Apache Flink
AKASH SIHAG
 
Deep Dive Into Kafka Streams (and the Distributed Stream Processing Engine) (...
Deep Dive Into Kafka Streams (and the Distributed Stream Processing Engine) (...Deep Dive Into Kafka Streams (and the Distributed Stream Processing Engine) (...
Deep Dive Into Kafka Streams (and the Distributed Stream Processing Engine) (...
confluent
 
Introduction to Kafka Streams Presentation
Introduction to Kafka Streams PresentationIntroduction to Kafka Streams Presentation
Introduction to Kafka Streams Presentation
Knoldus Inc.
 
What is apache Kafka?
What is apache Kafka?What is apache Kafka?
What is apache Kafka?
Kenny Gorman
 
What is Apache Kafka®?
What is Apache Kafka®?What is Apache Kafka®?
What is Apache Kafka®?
Eventador
 
Apache Kafka 0.11 の Exactly Once Semantics
Apache Kafka 0.11 の Exactly Once SemanticsApache Kafka 0.11 の Exactly Once Semantics
Apache Kafka 0.11 の Exactly Once Semantics
Yoshiyasu SAEKI
 
Kafka Streams for Java enthusiasts
Kafka Streams for Java enthusiastsKafka Streams for Java enthusiasts
Kafka Streams for Java enthusiasts
Slim Baltagi
 
Spark Streaming and MLlib - Hyderabad Spark Group
Spark Streaming and MLlib - Hyderabad Spark GroupSpark Streaming and MLlib - Hyderabad Spark Group
Spark Streaming and MLlib - Hyderabad Spark Group
Phaneendra Chiruvella
 
Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, St...
Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, St...Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, St...
Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, St...
Michael Noll
 
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Simplilearn
 
Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Car...
Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Car...Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Car...
Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Car...
Codemotion
 
BBL KAPPA Lesfurets.com
BBL KAPPA Lesfurets.comBBL KAPPA Lesfurets.com
BBL KAPPA Lesfurets.com
Cedric Vidal
 
Ad

More from Jacek Laskowski (11)

Opening slides to Warsaw Scala FortyFives on Testing tools
Opening slides to Warsaw Scala FortyFives on Testing toolsOpening slides to Warsaw Scala FortyFives on Testing tools
Opening slides to Warsaw Scala FortyFives on Testing tools
Jacek Laskowski
 
#Be #social #FTW aka Your #Professional #Development with #StackOverflow #Git...
#Be #social #FTW aka Your #Professional #Development with #StackOverflow #Git...#Be #social #FTW aka Your #Professional #Development with #StackOverflow #Git...
#Be #social #FTW aka Your #Professional #Development with #StackOverflow #Git...
Jacek Laskowski
 
StackOverflow, GitHub, twitter, reddit i Twój profesjonalny rozwój
StackOverflow, GitHub, twitter, reddit i Twój profesjonalny rozwójStackOverflow, GitHub, twitter, reddit i Twój profesjonalny rozwój
StackOverflow, GitHub, twitter, reddit i Twój profesjonalny rozwój
Jacek Laskowski
 
Introduction to Web Application Development in Clojure
Introduction to Web Application Development in ClojureIntroduction to Web Application Development in Clojure
Introduction to Web Application Development in Clojure
Jacek Laskowski
 
Introduction to Functional Programming in Scala
Introduction to Functional Programming in ScalaIntroduction to Functional Programming in Scala
Introduction to Functional Programming in Scala
Jacek Laskowski
 
Moje pierwsze kroki w programowaniu funkcyjnym w Scali
Moje pierwsze kroki w programowaniu funkcyjnym w ScaliMoje pierwsze kroki w programowaniu funkcyjnym w Scali
Moje pierwsze kroki w programowaniu funkcyjnym w Scali
Jacek Laskowski
 
Functional web development with Git(Hub), Heroku and Clojure
Functional web development with Git(Hub), Heroku and ClojureFunctional web development with Git(Hub), Heroku and Clojure
Functional web development with Git(Hub), Heroku and Clojure
Jacek Laskowski
 
Praktyczne wprowadzenie do OSGi i Enterprise OSGi
Praktyczne wprowadzenie do OSGi i Enterprise OSGiPraktyczne wprowadzenie do OSGi i Enterprise OSGi
Praktyczne wprowadzenie do OSGi i Enterprise OSGi
Jacek Laskowski
 
Developing modular applications with Java EE 6 and Enterprise OSGi + WebSpher...
Developing modular applications with Java EE 6 and Enterprise OSGi + WebSpher...Developing modular applications with Java EE 6 and Enterprise OSGi + WebSpher...
Developing modular applications with Java EE 6 and Enterprise OSGi + WebSpher...
Jacek Laskowski
 
Apache Tomcat + Java EE = Apache TomEE
Apache Tomcat + Java EE = Apache TomEEApache Tomcat + Java EE = Apache TomEE
Apache Tomcat + Java EE = Apache TomEE
Jacek Laskowski
 
(map Clojure everyday-tasks)
(map Clojure everyday-tasks)(map Clojure everyday-tasks)
(map Clojure everyday-tasks)
Jacek Laskowski
 
Opening slides to Warsaw Scala FortyFives on Testing tools
Opening slides to Warsaw Scala FortyFives on Testing toolsOpening slides to Warsaw Scala FortyFives on Testing tools
Opening slides to Warsaw Scala FortyFives on Testing tools
Jacek Laskowski
 
#Be #social #FTW aka Your #Professional #Development with #StackOverflow #Git...
#Be #social #FTW aka Your #Professional #Development with #StackOverflow #Git...#Be #social #FTW aka Your #Professional #Development with #StackOverflow #Git...
#Be #social #FTW aka Your #Professional #Development with #StackOverflow #Git...
Jacek Laskowski
 
StackOverflow, GitHub, twitter, reddit i Twój profesjonalny rozwój
StackOverflow, GitHub, twitter, reddit i Twój profesjonalny rozwójStackOverflow, GitHub, twitter, reddit i Twój profesjonalny rozwój
StackOverflow, GitHub, twitter, reddit i Twój profesjonalny rozwój
Jacek Laskowski
 
Introduction to Web Application Development in Clojure
Introduction to Web Application Development in ClojureIntroduction to Web Application Development in Clojure
Introduction to Web Application Development in Clojure
Jacek Laskowski
 
Introduction to Functional Programming in Scala
Introduction to Functional Programming in ScalaIntroduction to Functional Programming in Scala
Introduction to Functional Programming in Scala
Jacek Laskowski
 
Moje pierwsze kroki w programowaniu funkcyjnym w Scali
Moje pierwsze kroki w programowaniu funkcyjnym w ScaliMoje pierwsze kroki w programowaniu funkcyjnym w Scali
Moje pierwsze kroki w programowaniu funkcyjnym w Scali
Jacek Laskowski
 
Functional web development with Git(Hub), Heroku and Clojure
Functional web development with Git(Hub), Heroku and ClojureFunctional web development with Git(Hub), Heroku and Clojure
Functional web development with Git(Hub), Heroku and Clojure
Jacek Laskowski
 
Praktyczne wprowadzenie do OSGi i Enterprise OSGi
Praktyczne wprowadzenie do OSGi i Enterprise OSGiPraktyczne wprowadzenie do OSGi i Enterprise OSGi
Praktyczne wprowadzenie do OSGi i Enterprise OSGi
Jacek Laskowski
 
Developing modular applications with Java EE 6 and Enterprise OSGi + WebSpher...
Developing modular applications with Java EE 6 and Enterprise OSGi + WebSpher...Developing modular applications with Java EE 6 and Enterprise OSGi + WebSpher...
Developing modular applications with Java EE 6 and Enterprise OSGi + WebSpher...
Jacek Laskowski
 
Apache Tomcat + Java EE = Apache TomEE
Apache Tomcat + Java EE = Apache TomEEApache Tomcat + Java EE = Apache TomEE
Apache Tomcat + Java EE = Apache TomEE
Jacek Laskowski
 
(map Clojure everyday-tasks)
(map Clojure everyday-tasks)(map Clojure everyday-tasks)
(map Clojure everyday-tasks)
Jacek Laskowski
 
Ad

Recently uploaded (20)

Process Mining at Deutsche Bank - Journey
Process Mining at Deutsche Bank - JourneyProcess Mining at Deutsche Bank - Journey
Process Mining at Deutsche Bank - Journey
Process mining Evangelist
 
2-Raction quotient_١٠٠١٤٦.ppt of physical chemisstry
2-Raction quotient_١٠٠١٤٦.ppt of physical chemisstry2-Raction quotient_١٠٠١٤٦.ppt of physical chemisstry
2-Raction quotient_١٠٠١٤٦.ppt of physical chemisstry
bastakwyry
 
Oral Malodor.pptx jsjshdhushehsidjjeiejdhfj
Oral Malodor.pptx jsjshdhushehsidjjeiejdhfjOral Malodor.pptx jsjshdhushehsidjjeiejdhfj
Oral Malodor.pptx jsjshdhushehsidjjeiejdhfj
maitripatel5301
 
AI ------------------------------ W1L2.pptx
AI ------------------------------ W1L2.pptxAI ------------------------------ W1L2.pptx
AI ------------------------------ W1L2.pptx
AyeshaJalil6
 
Ann Naser Nabil- Data Scientist Portfolio.pdf
Ann Naser Nabil- Data Scientist Portfolio.pdfAnn Naser Nabil- Data Scientist Portfolio.pdf
Ann Naser Nabil- Data Scientist Portfolio.pdf
আন্ নাসের নাবিল
 
2024 Digital Equity Accelerator Report.pdf
2024 Digital Equity Accelerator Report.pdf2024 Digital Equity Accelerator Report.pdf
2024 Digital Equity Accelerator Report.pdf
dominikamizerska1
 
Publication-launch-How-is-Life-for-Children-in-the-Digital-Age-15-May-2025.pdf
Publication-launch-How-is-Life-for-Children-in-the-Digital-Age-15-May-2025.pdfPublication-launch-How-is-Life-for-Children-in-the-Digital-Age-15-May-2025.pdf
Publication-launch-How-is-Life-for-Children-in-the-Digital-Age-15-May-2025.pdf
StatsCommunications
 
Transforming health care with ai powered
Transforming health care with ai poweredTransforming health care with ai powered
Transforming health care with ai powered
gowthamarvj
 
problem solving.presentation slideshow bsc nursing
problem solving.presentation slideshow bsc nursingproblem solving.presentation slideshow bsc nursing
problem solving.presentation slideshow bsc nursing
vishnudathas123
 
HershAggregator (2).pdf musicretaildistribution
HershAggregator (2).pdf musicretaildistributionHershAggregator (2).pdf musicretaildistribution
HershAggregator (2).pdf musicretaildistribution
hershtara1
 
Fundamentals of Data Analysis, its types, tools, algorithms
Fundamentals of Data Analysis, its types, tools, algorithmsFundamentals of Data Analysis, its types, tools, algorithms
Fundamentals of Data Analysis, its types, tools, algorithms
priyaiyerkbcsc
 
hersh's midterm project.pdf music retail and distribution
hersh's midterm project.pdf music retail and distributionhersh's midterm project.pdf music retail and distribution
hersh's midterm project.pdf music retail and distribution
hershtara1
 
AWS-Certified-ML-Engineer-Associate-Slides.pdf
AWS-Certified-ML-Engineer-Associate-Slides.pdfAWS-Certified-ML-Engineer-Associate-Slides.pdf
AWS-Certified-ML-Engineer-Associate-Slides.pdf
philsparkshome
 
Mining a Global Trade Process with Data Science - Microsoft
Mining a Global Trade Process with Data Science - MicrosoftMining a Global Trade Process with Data Science - Microsoft
Mining a Global Trade Process with Data Science - Microsoft
Process mining Evangelist
 
real illuminati Uganda agent 0782561496/0756664682
real illuminati Uganda agent 0782561496/0756664682real illuminati Uganda agent 0782561496/0756664682
real illuminati Uganda agent 0782561496/0756664682
way to join real illuminati Agent In Kampala Call/WhatsApp+256782561496/0756664682
 
Introduction to systems thinking tools_Eng.pdf
Introduction to systems thinking tools_Eng.pdfIntroduction to systems thinking tools_Eng.pdf
Introduction to systems thinking tools_Eng.pdf
AbdurahmanAbd
 
How to Set Up Process Mining in a Decentralized Organization?
How to Set Up Process Mining in a Decentralized Organization?How to Set Up Process Mining in a Decentralized Organization?
How to Set Up Process Mining in a Decentralized Organization?
Process mining Evangelist
 
What is ETL? Difference between ETL and ELT?.pdf
What is ETL? Difference between ETL and ELT?.pdfWhat is ETL? Difference between ETL and ELT?.pdf
What is ETL? Difference between ETL and ELT?.pdf
SaikatBasu37
 
lecture_13 tree in mmmmmmmm mmmmmfftro.pptx
lecture_13 tree in mmmmmmmm     mmmmmfftro.pptxlecture_13 tree in mmmmmmmm     mmmmmfftro.pptx
lecture_13 tree in mmmmmmmm mmmmmfftro.pptx
sarajafffri058
 
CS-404 COA COURSE FILE JAN JUN 2025.docx
CS-404 COA COURSE FILE JAN JUN 2025.docxCS-404 COA COURSE FILE JAN JUN 2025.docx
CS-404 COA COURSE FILE JAN JUN 2025.docx
nidarizvitit
 
2-Raction quotient_١٠٠١٤٦.ppt of physical chemisstry
2-Raction quotient_١٠٠١٤٦.ppt of physical chemisstry2-Raction quotient_١٠٠١٤٦.ppt of physical chemisstry
2-Raction quotient_١٠٠١٤٦.ppt of physical chemisstry
bastakwyry
 
Oral Malodor.pptx jsjshdhushehsidjjeiejdhfj
Oral Malodor.pptx jsjshdhushehsidjjeiejdhfjOral Malodor.pptx jsjshdhushehsidjjeiejdhfj
Oral Malodor.pptx jsjshdhushehsidjjeiejdhfj
maitripatel5301
 
AI ------------------------------ W1L2.pptx
AI ------------------------------ W1L2.pptxAI ------------------------------ W1L2.pptx
AI ------------------------------ W1L2.pptx
AyeshaJalil6
 
2024 Digital Equity Accelerator Report.pdf
2024 Digital Equity Accelerator Report.pdf2024 Digital Equity Accelerator Report.pdf
2024 Digital Equity Accelerator Report.pdf
dominikamizerska1
 
Publication-launch-How-is-Life-for-Children-in-the-Digital-Age-15-May-2025.pdf
Publication-launch-How-is-Life-for-Children-in-the-Digital-Age-15-May-2025.pdfPublication-launch-How-is-Life-for-Children-in-the-Digital-Age-15-May-2025.pdf
Publication-launch-How-is-Life-for-Children-in-the-Digital-Age-15-May-2025.pdf
StatsCommunications
 
Transforming health care with ai powered
Transforming health care with ai poweredTransforming health care with ai powered
Transforming health care with ai powered
gowthamarvj
 
problem solving.presentation slideshow bsc nursing
problem solving.presentation slideshow bsc nursingproblem solving.presentation slideshow bsc nursing
problem solving.presentation slideshow bsc nursing
vishnudathas123
 
HershAggregator (2).pdf musicretaildistribution
HershAggregator (2).pdf musicretaildistributionHershAggregator (2).pdf musicretaildistribution
HershAggregator (2).pdf musicretaildistribution
hershtara1
 
Fundamentals of Data Analysis, its types, tools, algorithms
Fundamentals of Data Analysis, its types, tools, algorithmsFundamentals of Data Analysis, its types, tools, algorithms
Fundamentals of Data Analysis, its types, tools, algorithms
priyaiyerkbcsc
 
hersh's midterm project.pdf music retail and distribution
hersh's midterm project.pdf music retail and distributionhersh's midterm project.pdf music retail and distribution
hersh's midterm project.pdf music retail and distribution
hershtara1
 
AWS-Certified-ML-Engineer-Associate-Slides.pdf
AWS-Certified-ML-Engineer-Associate-Slides.pdfAWS-Certified-ML-Engineer-Associate-Slides.pdf
AWS-Certified-ML-Engineer-Associate-Slides.pdf
philsparkshome
 
Mining a Global Trade Process with Data Science - Microsoft
Mining a Global Trade Process with Data Science - MicrosoftMining a Global Trade Process with Data Science - Microsoft
Mining a Global Trade Process with Data Science - Microsoft
Process mining Evangelist
 
Introduction to systems thinking tools_Eng.pdf
Introduction to systems thinking tools_Eng.pdfIntroduction to systems thinking tools_Eng.pdf
Introduction to systems thinking tools_Eng.pdf
AbdurahmanAbd
 
How to Set Up Process Mining in a Decentralized Organization?
How to Set Up Process Mining in a Decentralized Organization?How to Set Up Process Mining in a Decentralized Organization?
How to Set Up Process Mining in a Decentralized Organization?
Process mining Evangelist
 
What is ETL? Difference between ETL and ELT?.pdf
What is ETL? Difference between ETL and ELT?.pdfWhat is ETL? Difference between ETL and ELT?.pdf
What is ETL? Difference between ETL and ELT?.pdf
SaikatBasu37
 
lecture_13 tree in mmmmmmmm mmmmmfftro.pptx
lecture_13 tree in mmmmmmmm     mmmmmfftro.pptxlecture_13 tree in mmmmmmmm     mmmmmfftro.pptx
lecture_13 tree in mmmmmmmm mmmmmfftro.pptx
sarajafffri058
 
CS-404 COA COURSE FILE JAN JUN 2025.docx
CS-404 COA COURSE FILE JAN JUN 2025.docxCS-404 COA COURSE FILE JAN JUN 2025.docx
CS-404 COA COURSE FILE JAN JUN 2025.docx
nidarizvitit
 

 Kafka Streams VS Spark Structured Streaming - Modern Stream Processing Engines Compared

  • 1. Kafka Streams vs Spark Structured Streaming Modern Stream Processing Engines Compared © Jacek Laskowski / @JacekLaskowski / jacek@japila.pl / DataMass Summit 2019
  • 2. ● A freelance IT consultant ● Specializing in Spark, Kafka, Kafka Streams, Scala ● Development | Consulting | Training ● "The Internals Of" online books ● Among contributors to Apache Spark ● Among Confluent Community Catalyst (Class of 2019 - 2020) ● Contact me at jacek@japila.pl ● Follow @JacekLaskowski on twitter for more #ApacheSpark #ApacheKafka #KafkaStreams Jacek Laskowski
  • 3. Friendly reminder Pictures...take a lot of pictures! 📷 © Jacek Laskowski / @JacekLaskowski / jacek@japila.pl
  • 4. The Features of Both 1. Stream Processing Engines 2. High-Level DSL for defining processing flow (logic) a. Topology b. Dataflow 3. Low-Level API for custom flows 4. Logical and physical plans a. Logical “what” and executable “how” © Jacek Laskowski / @JacekLaskowski / jacek@japila.pl
  • 5. Kafka Streams (1 of 2) 1. Kafka Streams 2.3.0 2. Java and Scala APIs 3. Yet Another Command-Line Application (“YACA”) a. High-availability and fault tolerance OOTB b. Creating consumer groups OOTB 4. Support for Apache Kafka only a. Use Kafka Connect to go beyond Kafka 5. Data Abstractions a. High-level Streams DSL (KStream, KTable, KGlobalTable) b. Low-level Processor API 6. One record at a time © Jacek Laskowski / @JacekLaskowski / jacek@japila.pl
  • 6. Kafka Streams (2 of 2) 1. ETL only a. No support for SQL or Machine Learning b. KSQL 2. Java 11 supported 3. Scala 2.12 4. No interactive shell / REPL for learning and prototyping 5. Rich join support (stream-stream, stream-table, stream-global table) 6. Reading from and writing to a single Kafka cluster 7. Uses RocksDB for persistent state storage © Jacek Laskowski / @JacekLaskowski / jacek@japila.pl
  • 7. Kafka Streams Code / Topology (1 of 2) © Jacek Laskowski / @JacekLaskowski / jacek@japila.pl
  • 8. Kafka Streams Code / Execution Env (1 of 2) © Jacek Laskowski / @JacekLaskowski / jacek@japila.pl
  • 9. Spark Structured Streaming (1 of 2) 1. Apache Spark 2.4.4 2. Stream Processing API for Scala, Java, Python, SQL a. Useful for software developers and data scientists 3. Requires cluster manager a. Apache Hadoop’s YARN / Apache Mesos / DC/OS / Spark Standalone 4. Lots of data sources a. Kafka, JSON, parquet, CSV, Avro, ORC, socket b. Data Source API 5. Data abstraction: streaming Dataset © Jacek Laskowski / @JacekLaskowski / jacek@japila.pl
  • 10. Spark Structured Streaming (2 of 2) 1. ETL + Machine Learning a. Spark MLlib supports streaming Datasets 2. Java 8 only 3. Scala 2.12 4. spark-shell for learning and prototyping 5. Streaming joins and aggregations 6. Reading from one Kafka cluster and writing to another Kafka cluster 7. Uses Hadoop DFS (HDFS) for checkpointing and persistent state storage © Jacek Laskowski / @JacekLaskowski / jacek@japila.pl
  • 11. Spark “Streams” Code / Loading Data (1 of 3) © Jacek Laskowski / @JacekLaskowski / jacek@japila.pl
  • 12. Spark “Streams” Code / Processing (2 of 3) © Jacek Laskowski / @JacekLaskowski / jacek@japila.pl
  • 13. Spark “Streams” Code / Saving Data (3 of 3) © Jacek Laskowski / @JacekLaskowski / jacek@japila.pl
  • 14. “The Internals Of” Online Books 1. The Internals of Kafka Streams 2. The Internals of Spark Structured Streaming
  • 15. Questions? 1. Follow @jaceklaskowski on twitter (DMs open) 2. Upvote my questions and answers on StackOverflow 3. Contact me at jacek@japila.pl 4. Connect with me at LinkedIn © Jacek Laskowski / @JacekLaskowski / jacek@japila.pl
  翻译: