SlideShare a Scribd company logo
Enabling Data Scientists to easily
create and own Kafka Consumers
Stefan Krawczyk
Mgr. Data Platform - Model Lifecycle
@stefkrawczyk
linkedin.com/in/skrawczyk
Try out Stitch Fix → goo.gl/Q3tCQ3
2
- What is Stitch Fix?
- Data Science @ Stitch Fix
- Stitch Fix’s opinionated Kafka consumer
- Learnings & Future Directions
Agenda
What is Stitch Fix
What does the company do?
Kafka Summit 2021 4
Stitch Fix is a personal styling service.
Shop at your personal curated store. Check out what you like.
Kafka Summit 2021 5
Data Science is behind everything we do.
algorithms-tour.stitchfix.com
Algorithms Org.
- 145+ Data Scientists and Platform Engineers
- 3 main verticals + platform
Data Platform
Kafka Summit 2021 6
whoami
Stefan Krawczyk
Mgr. Data Platform - Model Lifecycle
Pre-covid look
Data Science
@ Stitch Fix
Expectations we have on DS @ Stitch Fix
Kafka Summit 2021 8
Most common approach to Data Science
Typical organization:
● Horizontal teams
● Hand off
● Coordination required
DATA SCIENCE /
RESEARCH TEAMS
ETL TEAMS
ENGINEERING TEAMS
Kafka Summit 2021 9
At Stitch Fix:
● Single Organization
● No handoff
● End to end ownership
● We have a lot of them!
● Built on top of data
platform tools &
abstractions.
Full Stack Data Science
See https://cultivating-algos.stitchfix.com/
DATA SCIENCE
ETL
ENGINEERING
Kafka Summit 2021 10
Full Stack Data Science
A typical DS flow at Stitch Fix
Typical flow:
● Idea / Prototype
● ETL
● “Production”
● Eval/Monitoring/Oncall
● Start on next iteration
Kafka Summit 2021 11
Full Stack Data Science
A typical DS flow at Stitch Fix
Production can mean:
● Web service
● Batch job / Table
● Kafka consumer
Heavily biased towards Python.
Kafka Summit 2021 12
Example use cases DS have built kafka consumers for
Example Kafka Consumers
● A/B testing bucket allocation
● Transforming raw inputs into features
● Saving data into feature stores
● Event driven model prediction
● Triggering workflows
Stitch Fix’s opinionated
Kafka consumer
Code first, explanation second
Kafka Summit 2021 14
Our “Hello world”
Consumer Code [ ]
Architecture [ ]
Mechanics [ ]
Kafka Summit 2021 15
To run this:
> pip install sf_kafka
> python -m sf_kafka.server hello_world_consumer
A simple example
Hello world consumer
hello_world_consumer.py
import sf_kafka
@sf_kafka.register(kafka_topic='some.topic', output_schema={})
def hello_world(messages: List[str]) -> dict:
"""Hello world example
:param messages: list of strings, which are JSON objects.
:return: empty dict, as we don't need to emit any events.
"""
list_of_dicts = [json.loads(m) for m in messages]
print(f'Hello world I have consumed the following {list_of_dicts}')
return {}
Kafka Summit 2021 16
import sf_kafka
@sf_kafka.register(kafka_topic='some.topic', output_schema={})
def hello_world(messages: List[str]) -> dict:
"""Hello world example
:param messages: list of strings, which are JSON objects.
:return: empty dict, as we don't need to emit any events.
"""
list_of_dicts = [json.loads(m) for m in messages]
print(f'Hello world I have consumed the following {list_of_dicts}')
return {}
A simple example
Hello world consumer
hello_world_consumer.py
So what is this doing?
To run this:
> pip install sf_kafka
> python -m sf_kafka.server hello_world_consumer
Kafka Summit 2021 17
import sf_kafka
@sf_kafka.register(kafka_topic='some.topic', output_schema={})
def hello_world(messages: List[str]) -> dict:
"""Hello world example
:param messages: list of strings, which are JSON objects.
:return: empty dict, as we don't need to emit any events.
"""
list_of_dicts = [json.loads(m) for m in messages]
print(f'Hello world I have consumed the following {list_of_dicts}')
return {}
1. Python function that takes in a list of strings called messages.
A simple example
Hello world consumer
hello_world_consumer.py
Kafka Summit 2021 18
import sf_kafka
@sf_kafka.register(kafka_topic='some.topic', output_schema={})
def hello_world(messages: List[str]) -> dict:
"""Hello world example
:param messages: list of strings, which are JSON objects.
:return: empty dict, as we don't need to emit any events.
"""
list_of_dicts = [json.loads(m) for m in messages]
print(f'Hello world I have consumed the following {list_of_dicts}')
return {}
1. Python function that takes in a list of strings called messages.
2. We’re processing the messages into dictionaries.
A simple example
Hello world consumer
hello_world_consumer.py
Kafka Summit 2021 19
import sf_kafka
@sf_kafka.register(kafka_topic='some.topic', output_schema={})
def hello_world(messages: List[str]) -> dict:
"""Hello world example
:param messages: list of strings, which are JSON objects.
:return: empty dict, as we don't need to emit any events.
"""
list_of_dicts = [json.loads(m) for m in messages]
print(f'Hello world I have consumed the following {list_of_dicts}')
return {}
1. Python function that takes in a list of strings called messages.
2. We’re processing the messages into dictionaries.
3. Printing them to console. (DS would replace this with a call to their function())
A simple example
Hello world consumer
hello_world_consumer.py
Kafka Summit 2021 20
import sf_kafka
@sf_kafka.register(kafka_topic='some.topic', output_schema={})
def hello_world(messages: List[str]) -> dict:
"""Hello world example
:param messages: list of strings, which are JSON objects.
:return: empty dict, as we don't need to emit any events.
"""
list_of_dicts = [json.loads(m) for m in messages]
print(f'Hello world I have consumed the following {list_of_dicts}')
return {}
1. Python function that takes in a list of strings called messages.
2. We’re processing the messages into dictionaries.
3. Printing them to console.(DS would replace this with a call to their function())
4. We are registering this function to consume from ‘some.topic’ with no output.
A simple example
Hello world consumer
hello_world_consumer.py
Kafka Summit 2021 21
So what’s really going on?
When someone runs python -m sf_kafka.server hello_world_consumer
Consumer Code ✅
Architecture [ ]
Mechanics [ ]
Kafka Summit 2021 22
Kafka Summit 2021 23
1
Kafka Summit 2021 24
1
2
Kafka Summit 2021 25
1
2
3
Kafka Summit 2021 26
1
2
3
4
Kafka Summit 2021 27
1
2
3
4
Platform Concerns DS Concerns
Kafka Summit 2021 28
Platform Concerns vs DS Concerns
Consumer Code ✅
Architecture ✅
Mechanics [ ]
Kafka Summit 2021 29
Platform Concerns DS Concerns
● Kafka consumer operation:
○ What python kafka client to use
○ Kafka client configuration
○ Processing assumptions
■ At least once or at most once
○ How to write back to kafka
■ Direct to cluster or via a proxy?
■ Message serialization format
What does each side own
Platform Concerns vs DS Concerns
Kafka Summit 2021 30
Platform Concerns DS Concerns
● Kafka consumer operation:
○ What python kafka client to use
○ Kafka client configuration
○ Processing assumptions
■ At least once or at most once
○ How to write back to kafka
■ Direct to cluster or via a proxy?
■ Message serialization format
● Production operations:
○ Topic partitioning
○ Deployment vehicle for consumers
○ Monitoring hooks & tools
What does each side own
Platform Concerns vs DS Concerns
Kafka Summit 2021 31
Platform Concerns DS Concerns
● Kafka consumer operation:
○ What python kafka client to use
○ Kafka client configuration
○ Processing assumptions
■ At least once or at most once
○ How to write back to kafka
■ Direct to cluster or via a proxy?
■ Message serialization format
● Production operations:
○ Topic partitioning
○ Deployment vehicle for consumers
○ Monitoring hooks & tools
● Configuration:
○ App name [required]
○ Which topic(s) to consume from [required]
○ Process from beginning/end? [optional]
○ Processing “batch” size [optional]
○ Number of consumers [optional]
● Python function that operates over a list
● Output topic & message [if any]
● Oncall
What does each side own
Platform Concerns vs DS Concerns
Kafka Summit 2021 32
Platform Concerns DS Concerns
● Kafka consumer operation:
○ What python kafka client to use
○ Kafka client configuration
○ Processing assumptions
■ At least once or at most once
○ How to write back to kafka
■ Direct to cluster or via a proxy?
■ Message serialization format
● Production operations:
○ Topic partitioning
○ Deployment vehicle for consumers
○ Monitoring hooks & tools
● Configuration:
○ App name [required]
○ Which topic(s) to consume from [required]
○ Process from beginning/end? [optional]
○ Processing “batch” size [optional]
○ Number of consumers [optional]
● Python function that operates over a list
● Output topic & message [if any]
● Oncall
What does each side own
Platform Concerns vs DS Concerns
Can change without DS involvement -- just need to rebuild their app.
Kafka Summit 2021 33
Platform Concerns DS Concerns
● Kafka consumer operation:
○ What python kafka client to use
○ Kafka client configuration
○ Processing assumptions
■ At least once or at most once
○ How to write back to kafka
■ Direct to cluster or via a proxy?
■ Message serialization format
● Production operations:
○ Topic partitioning
○ Deployment vehicle for consumers
○ Topic monitoring hooks & tools
● Configuration:
○ App name [required]
○ Which topic(s) to consume from [required]
○ Process from beginning/end? [optional]
○ Processing “batch” size [optional]
○ Number of consumers [optional]
● Python function that operates over a list
● Output topic & message [if any]
● Oncall
What does each side own
Platform Concerns vs DS Concerns
Requires coordination with DS
Kafka Summit 2021 34
Platform Concern Choice Benefit
Kafka Client
Processing assumption
Salient choices we made on Platform
Kafka Summit 2021 35
Platform Concern Choice Benefit
Kafka Client python confluent-kafka
(librdkafka)
librdkafka is very performant & stable.
Processing assumption
Salient choices we made on Platform
Kafka Summit 2021 36
Platform Concern Choice Benefit
Kafka Client python confluent-kafka
(librdkafka)
librdkafka is very performant & stable.
Processing assumption At least once; functions
should be idempotent.
Enables very easy error recovery strategy:
● Consumer app breaks until it is fixed; can usually
wait until business hours.
● No loss of events.
● Monitoring trigger is consumer lag.
Salient choices we made on Platform
Kafka Summit 2021 37
Platform Concern Choice Benefit
Message serialization
format
Do we want to write back
to kafka directly?
Salient choices we made on Platform
Kafka Summit 2021 38
Platform Concern Choice Benefit
Message serialization
format
JSON Easy mapping to and from python dictionaries.
Easy to grok for DS.
* python support for other formats wasn’t great.
Do we want to write back
to kafka directly?
Salient choices we made on Platform
Kafka Summit 2021 39
Platform Concern Choice Benefit
Message serialization
format
JSON Easy mapping to and from python dictionaries.
Easy to grok for DS.
* python support for other formats wasn’t great.
Do we want to write back
to kafka directly?
Write via proxy service first. Enabled:
● Not having producer code in the engine.
● Ability to validate/introspect all messages.
● Ability to augment/change minor format
structure without having to redeploy all
consumers.
Salient choices we made on Platform
Kafka Summit 2021 40
Consumer Code ✅
Architecture ✅
Mechanics ✅
What’s missing? ⍰
Kafka Summit 2021 41
Completing the production story
Consumer Code ✅
Architecture ✅
Mechanics ✅
What’s missing? ⍰
Kafka Summit 2021 42
Completing the production story
Consumer Code ✅
Architecture ✅
Mechanics ✅
What’s missing? ⍰
Self-service
^
Kafka Summit 2021 43
The self-service story of how a DS gets a consumer to production
Completing the production story
Example Use Case: Event Driven Model Prediction
1. Client signs up & fills out profile.
2. Event is sent - client.signed_up.
3. Predict something about the client.
4. Emit predictions back to kafka.
5. Use this for email campaigns.
-> $$
Kafka Summit 2021 44
The self-service story of how a DS gets a consumer to production
Completing the production story
1. Determine the topic(s) to
consume.
Kafka Summit 2021 45
The self-service story of how a DS gets a consumer to production
Completing the production story
1. Determine the topic(s) to consume.
2. Write code:
a. Create a function & decorate it to
process events
b. If outputting an event, write a
schema
c. Commit code to a git repository
Event
Kafka Summit 2021 46
The self-service story of how a DS gets a consumer to production
Completing the production story
1. Determine the topic(s) to consume.
2. Write code:
a. Create a function & decorate it to
process events
b. If outputting an event, write a
schema
c. Commit code to a git repository
Event
def create_prediction(client: dict) -> dict:
# DS would write side effects or fetches here.
# E.g. grab features, predict,
# create output message;
prediction = ...
return make_ouput_event(client, prediction)
@sf_kaka.register(
kafka_topic='client.signed_up',
output_schema={'predict.topic': schema})
def predict_foo(messages: List[str]) -> dict:
"""Predict XX about a client. ..."""
clients = [json.loads(m) for m in messages]
predictions = [create_prediction(c)
for c in clients]
return {'predict.topic': predictions}
my_prediction.py
Kafka Summit 2021 47
The self-service story of how a DS gets a consumer to production
Completing the production story
1. Determine the topic(s) to consume.
2. Write code:
a. Create a function & decorate it to
process events
b. If outputting an event, write a
schema
c. Commit code to a git repository
Event
def create_prediction(client: dict) -> dict:
# DS would write side effects or fetches here.
# E.g. grab features, predict,
# create output message;
prediction = ...
return make_ouput_event(client, prediction)
@sf_kaka.register(
kafka_topic='client.signed_up',
output_schema={'predict.topic': schema})
def predict_foo(messages: List[str]) -> dict:
"""Predict XX about a client. ..."""
clients = [json.loads(m) for m in messages]
predictions = [create_prediction(c)
for c in clients]
return {'predict.topic': predictions}
my_prediction.py
Kafka Summit 2021 48
The self-service story of how a DS gets a consumer to production
Completing the production story
1. Determine the topic(s) to consume.
2. Write code:
a. Create a function & decorate it to
process events
b. If outputting an event, write a
schema
c. Commit code to a git repository
Event
def create_prediction(client: dict) -> dict:
# DS would write side effects or fetches here.
# E.g. grab features, predict,
# create output message;
prediction = ...
return make_ouput_event(client, prediction)
@sf_kaka.register(
kafka_topic='client.signed_up',
output_schema={'predict.topic': schema})
def predict_foo(messages: List[str]) -> dict:
"""Predict XX about a client. ..."""
clients = [json.loads(m) for m in messages]
predictions = [create_prediction(c)
for c in clients]
return {'predict.topic': predictions}
my_prediction.py
Kafka Summit 2021 49
The self-service story of how a DS gets a consumer to production
Completing the production story
1. Determine the topic(s) to consume.
2. Write code:
a. Create a function & decorate it to
process events
b. If outputting an event, write a
schema
c. Commit code to a git repository
Event
def create_prediction(client: dict) -> dict:
# DS would write side effects or fetches here.
# E.g. grab features, predict,
# create output message;
prediction = ...
return make_ouput_event(client, prediction)
@sf_kaka.register(
kafka_topic='client.signed_up',
output_schema={'predict.topic': schema})
def predict_foo(messages: List[str]) -> dict:
"""Predict XX about a client. ..."""
clients = [json.loads(m) for m in messages]
predictions = [create_prediction(c)
for c in clients]
return {'predict.topic': predictions}
my_prediction.py
Kafka Summit 2021 50
The self-service story of how a DS gets a consumer to production
Completing the production story
1. Determine the topic(s) to consume.
2. Write code:
a. Create a function & decorate it to
process events
b. If outputting an event, write a
schema
c. Commit code to a git repository
"""Schema that we want to validate against."""
schema = {
'metadata': {
'timestamp': str,
'id': str,
'version': str
},
'payload': {
'some_prediction_value': float,
'client': int
}
}
Event
my_prediction.py
Kafka Summit 2021 51
The self-service story of how a DS gets a consumer to production
Completing the production story
1. Determine the topic(s) to consume.
2. Write code
3. Deploy via command line:
a. Handles python environment creation
b. Builds docker container
c. Deploys
Kafka Summit 2021 52
Self-service deployment via command line
Completing the production story
Kafka Summit 2021 53
Self-service deployment via command line
Completing the production story
1
Kafka Summit 2021 54
Self-service deployment via command line
Completing the production story
2
1
Kafka Summit 2021 55
Self-service deployment via command line
Completing the production story
2
3
1
Kafka Summit 2021 56
Self-service deployment via command line
Completing the production story
DS Touch Points
Can be in production in < 1 hour
Self-service!
Kafka Summit 2021 57
The self-service story of how a DS gets a consumer to production
Completing the production story
1. Determine the topic(s) to
consume.
2. Write code
3. Deploy via command line
4. Oncall:
a. Small runbook
1
Kafka Summit 2021 58
The self-service story of how a DS gets a consumer to production
Completing the production story
1. Determine the topic(s) to
consume.
2. Write code
3. Deploy via command line
4. Oncall:
a. Small runbook
1
2
Kafka Summit 2021 59
The self-service story of how a DS gets a consumer to production
Completing the production story
1. Determine the topic(s) to
consume.
2. Write code
3. Deploy via command line
4. Oncall:
a. Small runbook
1
2
3
Kafka Summit 2021 60
The self-service story of how a DS gets a consumer to production
Completing the production story
1. Determine the topic(s) to
consume.
2. Write code
3. Deploy via command line
4. Oncall:
a. Small runbook
1
2
3
4
Learnings &
Future Directions
What we learned from this and where we’re looking to go.
Kafka Summit 2021 62
What? Learning
Learnings - DS Perspective
Kafka Summit 2021 63
What? Learning
Do they use it? ✅ 👍
Learnings - DS Perspective
Kafka Summit 2021 64
What? Learning
Do they use it? ✅ 👍
Focusing on the function 1. All they need to know about kafka is that it’ll give
them a list of events.
2. Leads to better separation of concerns:
a. Can split driver code versus their logic.
b. Test driven development is easy.
Learnings - DS Perspective
Kafka Summit 2021 65
What? Learning
Do they use it? ✅ 👍
Focusing on the function 1. All they need to know about kafka is that it’ll give
them a list of events.
2. Leads to better separation of concerns:
a. Can split driver code versus their logic.
b. Test driven development is easy.
At least once processing 1. They enjoy easy error recovery; gives DS time to fix
things.
2. Idempotency requirement not an issue.
Learnings - DS Perspective
Kafka Summit 2021 66
What? Learning
Learnings - Platform Perspective (1/2)
Kafka Summit 2021 67
What? Learning
Writing back via proxy service 1. Helped early on with some minor message format
adjustments & validation.
2. Would recommend writing back directly if we were to
start again.
a. Writing back directly leads to better performance.
Learnings - Platform Perspective (1/2)
Kafka Summit 2021 68
What? Learning
Writing back via proxy service 1. Helped early on with some minor message format
adjustments & validation.
2. Would recommend writing back directly if we were to
start again.
a. Writing back directly leads to better performance.
Central place for all things kafka Very useful to have a central place to:
1. Understand topics & topic contents.
2. Having “off the shelf” ability to materialize stream to a
datastore removed need for DS to manage/optimize this
process. E.g. elasticsearch, data warehouse, feature
store.
Learnings - Platform Perspective (1/2)
Kafka Summit 2021 69
What? Learning
Using internal async libraries Using internal asyncio libs is cumbersome for DS.
Native asyncio framework would feel better.*
Learnings - Platform Perspective (2/2)
* we ended up creating a very narrow focused micro-framework addressing these two issues using aiokafka.
Kafka Summit 2021 70
What? Learning
Using internal async libraries Using internal asyncio libs is cumbersome for DS.
Native asyncio framework would feel better.*
Lineage & Lineage Impacts If there is a chain of consumers*, didn’t have easy
introspection into:
● Processing speed of full chain
● Knowing what the chain was
Learnings - Platform Perspective (2/2)
* we ended up creating a very narrow focused micro-framework addressing these two issues using aiokafka.
Kafka Summit 2021 71
What? Why?
Being able to replace different
subcomponents & assumptions of the
system more easily.
Cleaner abstractions & modularity:
● Want to remove leaking business logic into
engine.
● Making parts pluggable means we can easily
change/swap out e.g. schema validation, or
serialization format, or how we write back to
kafka, processing assumptions, support asyncio,
etc.
Future Directions
Kafka Summit 2021 72
What? Why?
Being able to replace different
subcomponents & assumptions of the
system more easily.
Cleaner abstractions & modularity:
● Want to remove leaking business logic into
engine.
● Making parts pluggable means we can easily
change/swap out e.g. schema validation, or
serialization format, or how we write back to
kafka, processing assumptions, support asyncio,
etc.
Exploring stream processing like kafka
streams & faust.
Streaming processing over windows is slowly
becoming something more DS ask about.
Future Directions
Kafka Summit 2021 73
What? Why?
Being able to replace different
subcomponents & assumptions of the
system more easily.
Cleaner abstractions & modularity:
● Want to remove leaking business logic into
engine.
● Making parts pluggable means we can easily
change/swap out e.g. schema validation, or
serialization format, or how we write back to
kafka, processing assumptions, support asyncio,
etc.
Exploring stream processing like kafka
streams & faust.
Streaming processing over windows is slowly
becoming something more DS ask about.
Writing an open source version Hypothesis that this is valuable and that the
community would be interested; would you be?
Future Directions
Summary
TL;DR:
Kafka Summit 2021 75
TL;DR:
Summary
Kafka + Data Scientists @ Stitch Fix:
● We have a self-service platform for Data Scientists to deploy kafka consumers
● We achieve self-service through a separation of concerns:
○ Data Scientists focus on functions to process events
○ Data Platform provides guardrails for kafka operations
Questions?
Find me at:
@stefkrawczyk
linkedin.com/in/skrawczyk/ Try out Stitch Fix → goo.gl/Q3tCQ3
Ad

More Related Content

What's hot (20)

Frustration-Reduced Spark: DataFrames and the Spark Time-Series Library
Frustration-Reduced Spark: DataFrames and the Spark Time-Series LibraryFrustration-Reduced Spark: DataFrames and the Spark Time-Series Library
Frustration-Reduced Spark: DataFrames and the Spark Time-Series Library
Ilya Ganelin
 
Introducing Arc: A Common Intermediate Language for Unified Batch and Stream...
Introducing Arc:  A Common Intermediate Language for Unified Batch and Stream...Introducing Arc:  A Common Intermediate Language for Unified Batch and Stream...
Introducing Arc: A Common Intermediate Language for Unified Batch and Stream...
Flink Forward
 
Unified Stream & Batch Processing with Apache Flink (Hadoop Summit Dublin 2016)
Unified Stream & Batch Processing with Apache Flink (Hadoop Summit Dublin 2016)Unified Stream & Batch Processing with Apache Flink (Hadoop Summit Dublin 2016)
Unified Stream & Batch Processing with Apache Flink (Hadoop Summit Dublin 2016)
ucelebi
 
Sebastian Schelter – Distributed Machine Learing with the Samsara DSL
Sebastian Schelter – Distributed Machine Learing with the Samsara DSLSebastian Schelter – Distributed Machine Learing with the Samsara DSL
Sebastian Schelter – Distributed Machine Learing with the Samsara DSL
Flink Forward
 
More Data, More Problems: Evolving big data machine learning pipelines with S...
More Data, More Problems: Evolving big data machine learning pipelines with S...More Data, More Problems: Evolving big data machine learning pipelines with S...
More Data, More Problems: Evolving big data machine learning pipelines with S...
Alex Sadovsky
 
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...
Flink Forward
 
Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017
Petr Zapletal
 
Ufuc Celebi – Stream & Batch Processing in one System
Ufuc Celebi – Stream & Batch Processing in one SystemUfuc Celebi – Stream & Batch Processing in one System
Ufuc Celebi – Stream & Batch Processing in one System
Flink Forward
 
Apache Flink Internals: Stream & Batch Processing in One System – Apache Flin...
Apache Flink Internals: Stream & Batch Processing in One System – Apache Flin...Apache Flink Internals: Stream & Batch Processing in One System – Apache Flin...
Apache Flink Internals: Stream & Batch Processing in One System – Apache Flin...
ucelebi
 
Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)
Stephan Ewen
 
Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15
Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15
Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15
Vasia Kalavri
 
Pinot: Realtime OLAP for 530 Million Users - Sigmod 2018
Pinot: Realtime OLAP for 530 Million Users - Sigmod 2018Pinot: Realtime OLAP for 530 Million Users - Sigmod 2018
Pinot: Realtime OLAP for 530 Million Users - Sigmod 2018
Seunghyun Lee
 
Tuning and Debugging in Apache Spark
Tuning and Debugging in Apache SparkTuning and Debugging in Apache Spark
Tuning and Debugging in Apache Spark
Databricks
 
Spark streaming: Best Practices
Spark streaming: Best PracticesSpark streaming: Best Practices
Spark streaming: Best Practices
Prakash Chockalingam
 
Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internals
Kostas Tzoumas
 
Cassandra advanced data modeling
Cassandra advanced data modelingCassandra advanced data modeling
Cassandra advanced data modeling
Romain Hardouin
 
Leveraging the Power of Solr with Spark
Leveraging the Power of Solr with SparkLeveraging the Power of Solr with Spark
Leveraging the Power of Solr with Spark
QAware GmbH
 
Kafka 102: Streams and Tables All the Way Down | Kafka Summit San Francisco 2019
Kafka 102: Streams and Tables All the Way Down | Kafka Summit San Francisco 2019Kafka 102: Streams and Tables All the Way Down | Kafka Summit San Francisco 2019
Kafka 102: Streams and Tables All the Way Down | Kafka Summit San Francisco 2019
Michael Noll
 
PythonBrasil[8] - CPython for dummies
PythonBrasil[8] - CPython for dummiesPythonBrasil[8] - CPython for dummies
PythonBrasil[8] - CPython for dummies
Tatiana Al-Chueyr
 
Introduction to Structured Streaming
Introduction to Structured StreamingIntroduction to Structured Streaming
Introduction to Structured Streaming
Knoldus Inc.
 
Frustration-Reduced Spark: DataFrames and the Spark Time-Series Library
Frustration-Reduced Spark: DataFrames and the Spark Time-Series LibraryFrustration-Reduced Spark: DataFrames and the Spark Time-Series Library
Frustration-Reduced Spark: DataFrames and the Spark Time-Series Library
Ilya Ganelin
 
Introducing Arc: A Common Intermediate Language for Unified Batch and Stream...
Introducing Arc:  A Common Intermediate Language for Unified Batch and Stream...Introducing Arc:  A Common Intermediate Language for Unified Batch and Stream...
Introducing Arc: A Common Intermediate Language for Unified Batch and Stream...
Flink Forward
 
Unified Stream & Batch Processing with Apache Flink (Hadoop Summit Dublin 2016)
Unified Stream & Batch Processing with Apache Flink (Hadoop Summit Dublin 2016)Unified Stream & Batch Processing with Apache Flink (Hadoop Summit Dublin 2016)
Unified Stream & Batch Processing with Apache Flink (Hadoop Summit Dublin 2016)
ucelebi
 
Sebastian Schelter – Distributed Machine Learing with the Samsara DSL
Sebastian Schelter – Distributed Machine Learing with the Samsara DSLSebastian Schelter – Distributed Machine Learing with the Samsara DSL
Sebastian Schelter – Distributed Machine Learing with the Samsara DSL
Flink Forward
 
More Data, More Problems: Evolving big data machine learning pipelines with S...
More Data, More Problems: Evolving big data machine learning pipelines with S...More Data, More Problems: Evolving big data machine learning pipelines with S...
More Data, More Problems: Evolving big data machine learning pipelines with S...
Alex Sadovsky
 
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...
Flink Forward
 
Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017
Petr Zapletal
 
Ufuc Celebi – Stream & Batch Processing in one System
Ufuc Celebi – Stream & Batch Processing in one SystemUfuc Celebi – Stream & Batch Processing in one System
Ufuc Celebi – Stream & Batch Processing in one System
Flink Forward
 
Apache Flink Internals: Stream & Batch Processing in One System – Apache Flin...
Apache Flink Internals: Stream & Batch Processing in One System – Apache Flin...Apache Flink Internals: Stream & Batch Processing in One System – Apache Flin...
Apache Flink Internals: Stream & Batch Processing in One System – Apache Flin...
ucelebi
 
Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)
Stephan Ewen
 
Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15
Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15
Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15
Vasia Kalavri
 
Pinot: Realtime OLAP for 530 Million Users - Sigmod 2018
Pinot: Realtime OLAP for 530 Million Users - Sigmod 2018Pinot: Realtime OLAP for 530 Million Users - Sigmod 2018
Pinot: Realtime OLAP for 530 Million Users - Sigmod 2018
Seunghyun Lee
 
Tuning and Debugging in Apache Spark
Tuning and Debugging in Apache SparkTuning and Debugging in Apache Spark
Tuning and Debugging in Apache Spark
Databricks
 
Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internals
Kostas Tzoumas
 
Cassandra advanced data modeling
Cassandra advanced data modelingCassandra advanced data modeling
Cassandra advanced data modeling
Romain Hardouin
 
Leveraging the Power of Solr with Spark
Leveraging the Power of Solr with SparkLeveraging the Power of Solr with Spark
Leveraging the Power of Solr with Spark
QAware GmbH
 
Kafka 102: Streams and Tables All the Way Down | Kafka Summit San Francisco 2019
Kafka 102: Streams and Tables All the Way Down | Kafka Summit San Francisco 2019Kafka 102: Streams and Tables All the Way Down | Kafka Summit San Francisco 2019
Kafka 102: Streams and Tables All the Way Down | Kafka Summit San Francisco 2019
Michael Noll
 
PythonBrasil[8] - CPython for dummies
PythonBrasil[8] - CPython for dummiesPythonBrasil[8] - CPython for dummies
PythonBrasil[8] - CPython for dummies
Tatiana Al-Chueyr
 
Introduction to Structured Streaming
Introduction to Structured StreamingIntroduction to Structured Streaming
Introduction to Structured Streaming
Knoldus Inc.
 

Similar to Enabling Data Scientists to easily create and own Kafka Consumers (20)

Updating materialized views and caches using kafka
Updating materialized views and caches using kafkaUpdating materialized views and caches using kafka
Updating materialized views and caches using kafka
Zach Cox
 
Apache Kafka, and the Rise of Stream Processing
Apache Kafka, and the Rise of Stream ProcessingApache Kafka, and the Rise of Stream Processing
Apache Kafka, and the Rise of Stream Processing
Guozhang Wang
 
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
Athens Big Data
 
Containerizing Distributed Pipes
Containerizing Distributed PipesContainerizing Distributed Pipes
Containerizing Distributed Pipes
inside-BigData.com
 
Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015
Robbie Strickland
 
IIT-RTC 2017 Qt WebRTC Tutorial (Qt Janus Client)
IIT-RTC 2017 Qt WebRTC Tutorial (Qt Janus Client)IIT-RTC 2017 Qt WebRTC Tutorial (Qt Janus Client)
IIT-RTC 2017 Qt WebRTC Tutorial (Qt Janus Client)
Alexandre Gouaillard
 
Making your Life Easier with MongoDB and Kafka (Robert Walters, MongoDB) Kafk...
Making your Life Easier with MongoDB and Kafka (Robert Walters, MongoDB) Kafk...Making your Life Easier with MongoDB and Kafka (Robert Walters, MongoDB) Kafk...
Making your Life Easier with MongoDB and Kafka (Robert Walters, MongoDB) Kafk...
HostedbyConfluent
 
Building Kafka Connectors with Kotlin: A Step-by-Step Guide to Creation and D...
Building Kafka Connectors with Kotlin: A Step-by-Step Guide to Creation and D...Building Kafka Connectors with Kotlin: A Step-by-Step Guide to Creation and D...
Building Kafka Connectors with Kotlin: A Step-by-Step Guide to Creation and D...
HostedbyConfluent
 
Event Driven Microservices
Event Driven MicroservicesEvent Driven Microservices
Event Driven Microservices
Fabrizio Fortino
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Guido Schmutz
 
A New Chapter of Data Processing with CDK
A New Chapter of Data Processing with CDKA New Chapter of Data Processing with CDK
A New Chapter of Data Processing with CDK
Shu-Jeng Hsieh
 
Kafka Connect & Streams - the ecosystem around Kafka
Kafka Connect & Streams - the ecosystem around KafkaKafka Connect & Streams - the ecosystem around Kafka
Kafka Connect & Streams - the ecosystem around Kafka
Guido Schmutz
 
Deep learning and streaming in Apache Spark 2.2 by Matei Zaharia
Deep learning and streaming in Apache Spark 2.2 by Matei ZahariaDeep learning and streaming in Apache Spark 2.2 by Matei Zaharia
Deep learning and streaming in Apache Spark 2.2 by Matei Zaharia
GoDataDriven
 
Apache Pulsar Development 101 with Python
Apache Pulsar Development 101 with PythonApache Pulsar Development 101 with Python
Apache Pulsar Development 101 with Python
Timothy Spann
 
Stream Processing using Apache Spark and Apache Kafka
Stream Processing using Apache Spark and Apache KafkaStream Processing using Apache Spark and Apache Kafka
Stream Processing using Apache Spark and Apache Kafka
Abhinav Singh
 
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to StreamingBravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
Yaroslav Tkachenko
 
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
HostedbyConfluent
 
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020
confluent
 
Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka Streams
Guozhang Wang
 
Follow the (Kafka) Streams
Follow the (Kafka) StreamsFollow the (Kafka) Streams
Follow the (Kafka) Streams
confluent
 
Updating materialized views and caches using kafka
Updating materialized views and caches using kafkaUpdating materialized views and caches using kafka
Updating materialized views and caches using kafka
Zach Cox
 
Apache Kafka, and the Rise of Stream Processing
Apache Kafka, and the Rise of Stream ProcessingApache Kafka, and the Rise of Stream Processing
Apache Kafka, and the Rise of Stream Processing
Guozhang Wang
 
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
Athens Big Data
 
Containerizing Distributed Pipes
Containerizing Distributed PipesContainerizing Distributed Pipes
Containerizing Distributed Pipes
inside-BigData.com
 
Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015
Robbie Strickland
 
IIT-RTC 2017 Qt WebRTC Tutorial (Qt Janus Client)
IIT-RTC 2017 Qt WebRTC Tutorial (Qt Janus Client)IIT-RTC 2017 Qt WebRTC Tutorial (Qt Janus Client)
IIT-RTC 2017 Qt WebRTC Tutorial (Qt Janus Client)
Alexandre Gouaillard
 
Making your Life Easier with MongoDB and Kafka (Robert Walters, MongoDB) Kafk...
Making your Life Easier with MongoDB and Kafka (Robert Walters, MongoDB) Kafk...Making your Life Easier with MongoDB and Kafka (Robert Walters, MongoDB) Kafk...
Making your Life Easier with MongoDB and Kafka (Robert Walters, MongoDB) Kafk...
HostedbyConfluent
 
Building Kafka Connectors with Kotlin: A Step-by-Step Guide to Creation and D...
Building Kafka Connectors with Kotlin: A Step-by-Step Guide to Creation and D...Building Kafka Connectors with Kotlin: A Step-by-Step Guide to Creation and D...
Building Kafka Connectors with Kotlin: A Step-by-Step Guide to Creation and D...
HostedbyConfluent
 
Event Driven Microservices
Event Driven MicroservicesEvent Driven Microservices
Event Driven Microservices
Fabrizio Fortino
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Guido Schmutz
 
A New Chapter of Data Processing with CDK
A New Chapter of Data Processing with CDKA New Chapter of Data Processing with CDK
A New Chapter of Data Processing with CDK
Shu-Jeng Hsieh
 
Kafka Connect & Streams - the ecosystem around Kafka
Kafka Connect & Streams - the ecosystem around KafkaKafka Connect & Streams - the ecosystem around Kafka
Kafka Connect & Streams - the ecosystem around Kafka
Guido Schmutz
 
Deep learning and streaming in Apache Spark 2.2 by Matei Zaharia
Deep learning and streaming in Apache Spark 2.2 by Matei ZahariaDeep learning and streaming in Apache Spark 2.2 by Matei Zaharia
Deep learning and streaming in Apache Spark 2.2 by Matei Zaharia
GoDataDriven
 
Apache Pulsar Development 101 with Python
Apache Pulsar Development 101 with PythonApache Pulsar Development 101 with Python
Apache Pulsar Development 101 with Python
Timothy Spann
 
Stream Processing using Apache Spark and Apache Kafka
Stream Processing using Apache Spark and Apache KafkaStream Processing using Apache Spark and Apache Kafka
Stream Processing using Apache Spark and Apache Kafka
Abhinav Singh
 
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to StreamingBravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
Yaroslav Tkachenko
 
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
HostedbyConfluent
 
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020
confluent
 
Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka Streams
Guozhang Wang
 
Follow the (Kafka) Streams
Follow the (Kafka) StreamsFollow the (Kafka) Streams
Follow the (Kafka) Streams
confluent
 
Ad

Recently uploaded (20)

Solar-wind hybrid engery a system sustainable power
Solar-wind  hybrid engery a system sustainable powerSolar-wind  hybrid engery a system sustainable power
Solar-wind hybrid engery a system sustainable power
bhoomigowda12345
 
How I solved production issues with OpenTelemetry
How I solved production issues with OpenTelemetryHow I solved production issues with OpenTelemetry
How I solved production issues with OpenTelemetry
Cees Bos
 
Mobile Application Developer Dubai | Custom App Solutions by Ajath
Mobile Application Developer Dubai | Custom App Solutions by AjathMobile Application Developer Dubai | Custom App Solutions by Ajath
Mobile Application Developer Dubai | Custom App Solutions by Ajath
Ajath Infotech Technologies LLC
 
[gbgcpp] Let's get comfortable with concepts
[gbgcpp] Let's get comfortable with concepts[gbgcpp] Let's get comfortable with concepts
[gbgcpp] Let's get comfortable with concepts
Dimitrios Platis
 
Programs as Values - Write code and don't get lost
Programs as Values - Write code and don't get lostPrograms as Values - Write code and don't get lost
Programs as Values - Write code and don't get lost
Pierangelo Cecchetto
 
wAIred_LearnWithOutAI_JCON_14052025.pptx
wAIred_LearnWithOutAI_JCON_14052025.pptxwAIred_LearnWithOutAI_JCON_14052025.pptx
wAIred_LearnWithOutAI_JCON_14052025.pptx
SimonedeGijt
 
GDS SYSTEM | GLOBAL DISTRIBUTION SYSTEM
GDS SYSTEM | GLOBAL  DISTRIBUTION SYSTEMGDS SYSTEM | GLOBAL  DISTRIBUTION SYSTEM
GDS SYSTEM | GLOBAL DISTRIBUTION SYSTEM
philipnathen82
 
Sequence Diagrams With Pictures (1).pptx
Sequence Diagrams With Pictures (1).pptxSequence Diagrams With Pictures (1).pptx
Sequence Diagrams With Pictures (1).pptx
aashrithakondapalli8
 
Medical Device Cybersecurity Threat & Risk Scoring
Medical Device Cybersecurity Threat & Risk ScoringMedical Device Cybersecurity Threat & Risk Scoring
Medical Device Cybersecurity Threat & Risk Scoring
ICS
 
Best HR and Payroll Software in Bangladesh - accordHRM
Best HR and Payroll Software in Bangladesh - accordHRMBest HR and Payroll Software in Bangladesh - accordHRM
Best HR and Payroll Software in Bangladesh - accordHRM
accordHRM
 
Wilcom Embroidery Studio Crack Free Latest 2025
Wilcom Embroidery Studio Crack Free Latest 2025Wilcom Embroidery Studio Crack Free Latest 2025
Wilcom Embroidery Studio Crack Free Latest 2025
Web Designer
 
From Vibe Coding to Vibe Testing - Complete PowerPoint Presentation
From Vibe Coding to Vibe Testing - Complete PowerPoint PresentationFrom Vibe Coding to Vibe Testing - Complete PowerPoint Presentation
From Vibe Coding to Vibe Testing - Complete PowerPoint Presentation
Shay Ginsbourg
 
Buy vs. Build: Unlocking the right path for your training tech
Buy vs. Build: Unlocking the right path for your training techBuy vs. Build: Unlocking the right path for your training tech
Buy vs. Build: Unlocking the right path for your training tech
Rustici Software
 
Artificial hand using embedded system.pptx
Artificial hand using embedded system.pptxArtificial hand using embedded system.pptx
Artificial hand using embedded system.pptx
bhoomigowda12345
 
Adobe Media Encoder Crack FREE Download 2025
Adobe Media Encoder  Crack FREE Download 2025Adobe Media Encoder  Crack FREE Download 2025
Adobe Media Encoder Crack FREE Download 2025
zafranwaqar90
 
Robotic Process Automation (RPA) Software Development Services.pptx
Robotic Process Automation (RPA) Software Development Services.pptxRobotic Process Automation (RPA) Software Development Services.pptx
Robotic Process Automation (RPA) Software Development Services.pptx
julia smits
 
Troubleshooting JVM Outages – 3 Fortune 500 case studies
Troubleshooting JVM Outages – 3 Fortune 500 case studiesTroubleshooting JVM Outages – 3 Fortune 500 case studies
Troubleshooting JVM Outages – 3 Fortune 500 case studies
Tier1 app
 
Orion Context Broker introduction 20250509
Orion Context Broker introduction 20250509Orion Context Broker introduction 20250509
Orion Context Broker introduction 20250509
Fermin Galan
 
Passive House Canada Conference 2025 Presentation [Final]_v4.ppt
Passive House Canada Conference 2025 Presentation [Final]_v4.pptPassive House Canada Conference 2025 Presentation [Final]_v4.ppt
Passive House Canada Conference 2025 Presentation [Final]_v4.ppt
IES VE
 
Wilcom Embroidery Studio Crack 2025 For Windows
Wilcom Embroidery Studio Crack 2025 For WindowsWilcom Embroidery Studio Crack 2025 For Windows
Wilcom Embroidery Studio Crack 2025 For Windows
Google
 
Solar-wind hybrid engery a system sustainable power
Solar-wind  hybrid engery a system sustainable powerSolar-wind  hybrid engery a system sustainable power
Solar-wind hybrid engery a system sustainable power
bhoomigowda12345
 
How I solved production issues with OpenTelemetry
How I solved production issues with OpenTelemetryHow I solved production issues with OpenTelemetry
How I solved production issues with OpenTelemetry
Cees Bos
 
Mobile Application Developer Dubai | Custom App Solutions by Ajath
Mobile Application Developer Dubai | Custom App Solutions by AjathMobile Application Developer Dubai | Custom App Solutions by Ajath
Mobile Application Developer Dubai | Custom App Solutions by Ajath
Ajath Infotech Technologies LLC
 
[gbgcpp] Let's get comfortable with concepts
[gbgcpp] Let's get comfortable with concepts[gbgcpp] Let's get comfortable with concepts
[gbgcpp] Let's get comfortable with concepts
Dimitrios Platis
 
Programs as Values - Write code and don't get lost
Programs as Values - Write code and don't get lostPrograms as Values - Write code and don't get lost
Programs as Values - Write code and don't get lost
Pierangelo Cecchetto
 
wAIred_LearnWithOutAI_JCON_14052025.pptx
wAIred_LearnWithOutAI_JCON_14052025.pptxwAIred_LearnWithOutAI_JCON_14052025.pptx
wAIred_LearnWithOutAI_JCON_14052025.pptx
SimonedeGijt
 
GDS SYSTEM | GLOBAL DISTRIBUTION SYSTEM
GDS SYSTEM | GLOBAL  DISTRIBUTION SYSTEMGDS SYSTEM | GLOBAL  DISTRIBUTION SYSTEM
GDS SYSTEM | GLOBAL DISTRIBUTION SYSTEM
philipnathen82
 
Sequence Diagrams With Pictures (1).pptx
Sequence Diagrams With Pictures (1).pptxSequence Diagrams With Pictures (1).pptx
Sequence Diagrams With Pictures (1).pptx
aashrithakondapalli8
 
Medical Device Cybersecurity Threat & Risk Scoring
Medical Device Cybersecurity Threat & Risk ScoringMedical Device Cybersecurity Threat & Risk Scoring
Medical Device Cybersecurity Threat & Risk Scoring
ICS
 
Best HR and Payroll Software in Bangladesh - accordHRM
Best HR and Payroll Software in Bangladesh - accordHRMBest HR and Payroll Software in Bangladesh - accordHRM
Best HR and Payroll Software in Bangladesh - accordHRM
accordHRM
 
Wilcom Embroidery Studio Crack Free Latest 2025
Wilcom Embroidery Studio Crack Free Latest 2025Wilcom Embroidery Studio Crack Free Latest 2025
Wilcom Embroidery Studio Crack Free Latest 2025
Web Designer
 
From Vibe Coding to Vibe Testing - Complete PowerPoint Presentation
From Vibe Coding to Vibe Testing - Complete PowerPoint PresentationFrom Vibe Coding to Vibe Testing - Complete PowerPoint Presentation
From Vibe Coding to Vibe Testing - Complete PowerPoint Presentation
Shay Ginsbourg
 
Buy vs. Build: Unlocking the right path for your training tech
Buy vs. Build: Unlocking the right path for your training techBuy vs. Build: Unlocking the right path for your training tech
Buy vs. Build: Unlocking the right path for your training tech
Rustici Software
 
Artificial hand using embedded system.pptx
Artificial hand using embedded system.pptxArtificial hand using embedded system.pptx
Artificial hand using embedded system.pptx
bhoomigowda12345
 
Adobe Media Encoder Crack FREE Download 2025
Adobe Media Encoder  Crack FREE Download 2025Adobe Media Encoder  Crack FREE Download 2025
Adobe Media Encoder Crack FREE Download 2025
zafranwaqar90
 
Robotic Process Automation (RPA) Software Development Services.pptx
Robotic Process Automation (RPA) Software Development Services.pptxRobotic Process Automation (RPA) Software Development Services.pptx
Robotic Process Automation (RPA) Software Development Services.pptx
julia smits
 
Troubleshooting JVM Outages – 3 Fortune 500 case studies
Troubleshooting JVM Outages – 3 Fortune 500 case studiesTroubleshooting JVM Outages – 3 Fortune 500 case studies
Troubleshooting JVM Outages – 3 Fortune 500 case studies
Tier1 app
 
Orion Context Broker introduction 20250509
Orion Context Broker introduction 20250509Orion Context Broker introduction 20250509
Orion Context Broker introduction 20250509
Fermin Galan
 
Passive House Canada Conference 2025 Presentation [Final]_v4.ppt
Passive House Canada Conference 2025 Presentation [Final]_v4.pptPassive House Canada Conference 2025 Presentation [Final]_v4.ppt
Passive House Canada Conference 2025 Presentation [Final]_v4.ppt
IES VE
 
Wilcom Embroidery Studio Crack 2025 For Windows
Wilcom Embroidery Studio Crack 2025 For WindowsWilcom Embroidery Studio Crack 2025 For Windows
Wilcom Embroidery Studio Crack 2025 For Windows
Google
 
Ad

Enabling Data Scientists to easily create and own Kafka Consumers

  • 1. Enabling Data Scientists to easily create and own Kafka Consumers Stefan Krawczyk Mgr. Data Platform - Model Lifecycle @stefkrawczyk linkedin.com/in/skrawczyk Try out Stitch Fix → goo.gl/Q3tCQ3
  • 2. 2 - What is Stitch Fix? - Data Science @ Stitch Fix - Stitch Fix’s opinionated Kafka consumer - Learnings & Future Directions Agenda
  • 3. What is Stitch Fix What does the company do?
  • 4. Kafka Summit 2021 4 Stitch Fix is a personal styling service. Shop at your personal curated store. Check out what you like.
  • 5. Kafka Summit 2021 5 Data Science is behind everything we do. algorithms-tour.stitchfix.com Algorithms Org. - 145+ Data Scientists and Platform Engineers - 3 main verticals + platform Data Platform
  • 6. Kafka Summit 2021 6 whoami Stefan Krawczyk Mgr. Data Platform - Model Lifecycle Pre-covid look
  • 7. Data Science @ Stitch Fix Expectations we have on DS @ Stitch Fix
  • 8. Kafka Summit 2021 8 Most common approach to Data Science Typical organization: ● Horizontal teams ● Hand off ● Coordination required DATA SCIENCE / RESEARCH TEAMS ETL TEAMS ENGINEERING TEAMS
  • 9. Kafka Summit 2021 9 At Stitch Fix: ● Single Organization ● No handoff ● End to end ownership ● We have a lot of them! ● Built on top of data platform tools & abstractions. Full Stack Data Science See https://cultivating-algos.stitchfix.com/ DATA SCIENCE ETL ENGINEERING
  • 10. Kafka Summit 2021 10 Full Stack Data Science A typical DS flow at Stitch Fix Typical flow: ● Idea / Prototype ● ETL ● “Production” ● Eval/Monitoring/Oncall ● Start on next iteration
  • 11. Kafka Summit 2021 11 Full Stack Data Science A typical DS flow at Stitch Fix Production can mean: ● Web service ● Batch job / Table ● Kafka consumer Heavily biased towards Python.
  • 12. Kafka Summit 2021 12 Example use cases DS have built kafka consumers for Example Kafka Consumers ● A/B testing bucket allocation ● Transforming raw inputs into features ● Saving data into feature stores ● Event driven model prediction ● Triggering workflows
  • 13. Stitch Fix’s opinionated Kafka consumer Code first, explanation second
  • 14. Kafka Summit 2021 14 Our “Hello world” Consumer Code [ ] Architecture [ ] Mechanics [ ]
  • 15. Kafka Summit 2021 15 To run this: > pip install sf_kafka > python -m sf_kafka.server hello_world_consumer A simple example Hello world consumer hello_world_consumer.py import sf_kafka @sf_kafka.register(kafka_topic='some.topic', output_schema={}) def hello_world(messages: List[str]) -> dict: """Hello world example :param messages: list of strings, which are JSON objects. :return: empty dict, as we don't need to emit any events. """ list_of_dicts = [json.loads(m) for m in messages] print(f'Hello world I have consumed the following {list_of_dicts}') return {}
  • 16. Kafka Summit 2021 16 import sf_kafka @sf_kafka.register(kafka_topic='some.topic', output_schema={}) def hello_world(messages: List[str]) -> dict: """Hello world example :param messages: list of strings, which are JSON objects. :return: empty dict, as we don't need to emit any events. """ list_of_dicts = [json.loads(m) for m in messages] print(f'Hello world I have consumed the following {list_of_dicts}') return {} A simple example Hello world consumer hello_world_consumer.py So what is this doing? To run this: > pip install sf_kafka > python -m sf_kafka.server hello_world_consumer
  • 17. Kafka Summit 2021 17 import sf_kafka @sf_kafka.register(kafka_topic='some.topic', output_schema={}) def hello_world(messages: List[str]) -> dict: """Hello world example :param messages: list of strings, which are JSON objects. :return: empty dict, as we don't need to emit any events. """ list_of_dicts = [json.loads(m) for m in messages] print(f'Hello world I have consumed the following {list_of_dicts}') return {} 1. Python function that takes in a list of strings called messages. A simple example Hello world consumer hello_world_consumer.py
  • 18. Kafka Summit 2021 18 import sf_kafka @sf_kafka.register(kafka_topic='some.topic', output_schema={}) def hello_world(messages: List[str]) -> dict: """Hello world example :param messages: list of strings, which are JSON objects. :return: empty dict, as we don't need to emit any events. """ list_of_dicts = [json.loads(m) for m in messages] print(f'Hello world I have consumed the following {list_of_dicts}') return {} 1. Python function that takes in a list of strings called messages. 2. We’re processing the messages into dictionaries. A simple example Hello world consumer hello_world_consumer.py
  • 19. Kafka Summit 2021 19 import sf_kafka @sf_kafka.register(kafka_topic='some.topic', output_schema={}) def hello_world(messages: List[str]) -> dict: """Hello world example :param messages: list of strings, which are JSON objects. :return: empty dict, as we don't need to emit any events. """ list_of_dicts = [json.loads(m) for m in messages] print(f'Hello world I have consumed the following {list_of_dicts}') return {} 1. Python function that takes in a list of strings called messages. 2. We’re processing the messages into dictionaries. 3. Printing them to console. (DS would replace this with a call to their function()) A simple example Hello world consumer hello_world_consumer.py
  • 20. Kafka Summit 2021 20 import sf_kafka @sf_kafka.register(kafka_topic='some.topic', output_schema={}) def hello_world(messages: List[str]) -> dict: """Hello world example :param messages: list of strings, which are JSON objects. :return: empty dict, as we don't need to emit any events. """ list_of_dicts = [json.loads(m) for m in messages] print(f'Hello world I have consumed the following {list_of_dicts}') return {} 1. Python function that takes in a list of strings called messages. 2. We’re processing the messages into dictionaries. 3. Printing them to console.(DS would replace this with a call to their function()) 4. We are registering this function to consume from ‘some.topic’ with no output. A simple example Hello world consumer hello_world_consumer.py
  • 21. Kafka Summit 2021 21 So what’s really going on? When someone runs python -m sf_kafka.server hello_world_consumer Consumer Code ✅ Architecture [ ] Mechanics [ ]
  • 25. Kafka Summit 2021 25 1 2 3
  • 26. Kafka Summit 2021 26 1 2 3 4
  • 27. Kafka Summit 2021 27 1 2 3 4 Platform Concerns DS Concerns
  • 28. Kafka Summit 2021 28 Platform Concerns vs DS Concerns Consumer Code ✅ Architecture ✅ Mechanics [ ]
  • 29. Kafka Summit 2021 29 Platform Concerns DS Concerns ● Kafka consumer operation: ○ What python kafka client to use ○ Kafka client configuration ○ Processing assumptions ■ At least once or at most once ○ How to write back to kafka ■ Direct to cluster or via a proxy? ■ Message serialization format What does each side own Platform Concerns vs DS Concerns
  • 30. Kafka Summit 2021 30 Platform Concerns DS Concerns ● Kafka consumer operation: ○ What python kafka client to use ○ Kafka client configuration ○ Processing assumptions ■ At least once or at most once ○ How to write back to kafka ■ Direct to cluster or via a proxy? ■ Message serialization format ● Production operations: ○ Topic partitioning ○ Deployment vehicle for consumers ○ Monitoring hooks & tools What does each side own Platform Concerns vs DS Concerns
  • 31. Kafka Summit 2021 31 Platform Concerns DS Concerns ● Kafka consumer operation: ○ What python kafka client to use ○ Kafka client configuration ○ Processing assumptions ■ At least once or at most once ○ How to write back to kafka ■ Direct to cluster or via a proxy? ■ Message serialization format ● Production operations: ○ Topic partitioning ○ Deployment vehicle for consumers ○ Monitoring hooks & tools ● Configuration: ○ App name [required] ○ Which topic(s) to consume from [required] ○ Process from beginning/end? [optional] ○ Processing “batch” size [optional] ○ Number of consumers [optional] ● Python function that operates over a list ● Output topic & message [if any] ● Oncall What does each side own Platform Concerns vs DS Concerns
  • 32. Kafka Summit 2021 32 Platform Concerns DS Concerns ● Kafka consumer operation: ○ What python kafka client to use ○ Kafka client configuration ○ Processing assumptions ■ At least once or at most once ○ How to write back to kafka ■ Direct to cluster or via a proxy? ■ Message serialization format ● Production operations: ○ Topic partitioning ○ Deployment vehicle for consumers ○ Monitoring hooks & tools ● Configuration: ○ App name [required] ○ Which topic(s) to consume from [required] ○ Process from beginning/end? [optional] ○ Processing “batch” size [optional] ○ Number of consumers [optional] ● Python function that operates over a list ● Output topic & message [if any] ● Oncall What does each side own Platform Concerns vs DS Concerns Can change without DS involvement -- just need to rebuild their app.
  • 33. Kafka Summit 2021 33 Platform Concerns DS Concerns ● Kafka consumer operation: ○ What python kafka client to use ○ Kafka client configuration ○ Processing assumptions ■ At least once or at most once ○ How to write back to kafka ■ Direct to cluster or via a proxy? ■ Message serialization format ● Production operations: ○ Topic partitioning ○ Deployment vehicle for consumers ○ Topic monitoring hooks & tools ● Configuration: ○ App name [required] ○ Which topic(s) to consume from [required] ○ Process from beginning/end? [optional] ○ Processing “batch” size [optional] ○ Number of consumers [optional] ● Python function that operates over a list ● Output topic & message [if any] ● Oncall What does each side own Platform Concerns vs DS Concerns Requires coordination with DS
  • 34. Kafka Summit 2021 34 Platform Concern Choice Benefit Kafka Client Processing assumption Salient choices we made on Platform
  • 35. Kafka Summit 2021 35 Platform Concern Choice Benefit Kafka Client python confluent-kafka (librdkafka) librdkafka is very performant & stable. Processing assumption Salient choices we made on Platform
  • 36. Kafka Summit 2021 36 Platform Concern Choice Benefit Kafka Client python confluent-kafka (librdkafka) librdkafka is very performant & stable. Processing assumption At least once; functions should be idempotent. Enables very easy error recovery strategy: ● Consumer app breaks until it is fixed; can usually wait until business hours. ● No loss of events. ● Monitoring trigger is consumer lag. Salient choices we made on Platform
  • 37. Kafka Summit 2021 37 Platform Concern Choice Benefit Message serialization format Do we want to write back to kafka directly? Salient choices we made on Platform
  • 38. Kafka Summit 2021 38 Platform Concern Choice Benefit Message serialization format JSON Easy mapping to and from python dictionaries. Easy to grok for DS. * python support for other formats wasn’t great. Do we want to write back to kafka directly? Salient choices we made on Platform
  • 39. Kafka Summit 2021 39 Platform Concern Choice Benefit Message serialization format JSON Easy mapping to and from python dictionaries. Easy to grok for DS. * python support for other formats wasn’t great. Do we want to write back to kafka directly? Write via proxy service first. Enabled: ● Not having producer code in the engine. ● Ability to validate/introspect all messages. ● Ability to augment/change minor format structure without having to redeploy all consumers. Salient choices we made on Platform
  • 40. Kafka Summit 2021 40 Consumer Code ✅ Architecture ✅ Mechanics ✅ What’s missing? ⍰
  • 41. Kafka Summit 2021 41 Completing the production story Consumer Code ✅ Architecture ✅ Mechanics ✅ What’s missing? ⍰
  • 42. Kafka Summit 2021 42 Completing the production story Consumer Code ✅ Architecture ✅ Mechanics ✅ What’s missing? ⍰ Self-service ^
  • 43. Kafka Summit 2021 43 The self-service story of how a DS gets a consumer to production Completing the production story Example Use Case: Event Driven Model Prediction 1. Client signs up & fills out profile. 2. Event is sent - client.signed_up. 3. Predict something about the client. 4. Emit predictions back to kafka. 5. Use this for email campaigns. -> $$
  • 44. Kafka Summit 2021 44 The self-service story of how a DS gets a consumer to production Completing the production story 1. Determine the topic(s) to consume.
  • 45. Kafka Summit 2021 45 The self-service story of how a DS gets a consumer to production Completing the production story 1. Determine the topic(s) to consume. 2. Write code: a. Create a function & decorate it to process events b. If outputting an event, write a schema c. Commit code to a git repository Event
  • 46. Kafka Summit 2021 46 The self-service story of how a DS gets a consumer to production Completing the production story 1. Determine the topic(s) to consume. 2. Write code: a. Create a function & decorate it to process events b. If outputting an event, write a schema c. Commit code to a git repository Event def create_prediction(client: dict) -> dict: # DS would write side effects or fetches here. # E.g. grab features, predict, # create output message; prediction = ... return make_ouput_event(client, prediction) @sf_kaka.register( kafka_topic='client.signed_up', output_schema={'predict.topic': schema}) def predict_foo(messages: List[str]) -> dict: """Predict XX about a client. ...""" clients = [json.loads(m) for m in messages] predictions = [create_prediction(c) for c in clients] return {'predict.topic': predictions} my_prediction.py
  • 47. Kafka Summit 2021 47 The self-service story of how a DS gets a consumer to production Completing the production story 1. Determine the topic(s) to consume. 2. Write code: a. Create a function & decorate it to process events b. If outputting an event, write a schema c. Commit code to a git repository Event def create_prediction(client: dict) -> dict: # DS would write side effects or fetches here. # E.g. grab features, predict, # create output message; prediction = ... return make_ouput_event(client, prediction) @sf_kaka.register( kafka_topic='client.signed_up', output_schema={'predict.topic': schema}) def predict_foo(messages: List[str]) -> dict: """Predict XX about a client. ...""" clients = [json.loads(m) for m in messages] predictions = [create_prediction(c) for c in clients] return {'predict.topic': predictions} my_prediction.py
  • 48. Kafka Summit 2021 48 The self-service story of how a DS gets a consumer to production Completing the production story 1. Determine the topic(s) to consume. 2. Write code: a. Create a function & decorate it to process events b. If outputting an event, write a schema c. Commit code to a git repository Event def create_prediction(client: dict) -> dict: # DS would write side effects or fetches here. # E.g. grab features, predict, # create output message; prediction = ... return make_ouput_event(client, prediction) @sf_kaka.register( kafka_topic='client.signed_up', output_schema={'predict.topic': schema}) def predict_foo(messages: List[str]) -> dict: """Predict XX about a client. ...""" clients = [json.loads(m) for m in messages] predictions = [create_prediction(c) for c in clients] return {'predict.topic': predictions} my_prediction.py
  • 49. Kafka Summit 2021 49 The self-service story of how a DS gets a consumer to production Completing the production story 1. Determine the topic(s) to consume. 2. Write code: a. Create a function & decorate it to process events b. If outputting an event, write a schema c. Commit code to a git repository Event def create_prediction(client: dict) -> dict: # DS would write side effects or fetches here. # E.g. grab features, predict, # create output message; prediction = ... return make_ouput_event(client, prediction) @sf_kaka.register( kafka_topic='client.signed_up', output_schema={'predict.topic': schema}) def predict_foo(messages: List[str]) -> dict: """Predict XX about a client. ...""" clients = [json.loads(m) for m in messages] predictions = [create_prediction(c) for c in clients] return {'predict.topic': predictions} my_prediction.py
  • 50. Kafka Summit 2021 50 The self-service story of how a DS gets a consumer to production Completing the production story 1. Determine the topic(s) to consume. 2. Write code: a. Create a function & decorate it to process events b. If outputting an event, write a schema c. Commit code to a git repository """Schema that we want to validate against.""" schema = { 'metadata': { 'timestamp': str, 'id': str, 'version': str }, 'payload': { 'some_prediction_value': float, 'client': int } } Event my_prediction.py
  • 51. Kafka Summit 2021 51 The self-service story of how a DS gets a consumer to production Completing the production story 1. Determine the topic(s) to consume. 2. Write code 3. Deploy via command line: a. Handles python environment creation b. Builds docker container c. Deploys
  • 52. Kafka Summit 2021 52 Self-service deployment via command line Completing the production story
  • 53. Kafka Summit 2021 53 Self-service deployment via command line Completing the production story 1
  • 54. Kafka Summit 2021 54 Self-service deployment via command line Completing the production story 2 1
  • 55. Kafka Summit 2021 55 Self-service deployment via command line Completing the production story 2 3 1
  • 56. Kafka Summit 2021 56 Self-service deployment via command line Completing the production story DS Touch Points Can be in production in < 1 hour Self-service!
  • 57. Kafka Summit 2021 57 The self-service story of how a DS gets a consumer to production Completing the production story 1. Determine the topic(s) to consume. 2. Write code 3. Deploy via command line 4. Oncall: a. Small runbook 1
  • 58. Kafka Summit 2021 58 The self-service story of how a DS gets a consumer to production Completing the production story 1. Determine the topic(s) to consume. 2. Write code 3. Deploy via command line 4. Oncall: a. Small runbook 1 2
  • 59. Kafka Summit 2021 59 The self-service story of how a DS gets a consumer to production Completing the production story 1. Determine the topic(s) to consume. 2. Write code 3. Deploy via command line 4. Oncall: a. Small runbook 1 2 3
  • 60. Kafka Summit 2021 60 The self-service story of how a DS gets a consumer to production Completing the production story 1. Determine the topic(s) to consume. 2. Write code 3. Deploy via command line 4. Oncall: a. Small runbook 1 2 3 4
  • 61. Learnings & Future Directions What we learned from this and where we’re looking to go.
  • 62. Kafka Summit 2021 62 What? Learning Learnings - DS Perspective
  • 63. Kafka Summit 2021 63 What? Learning Do they use it? ✅ 👍 Learnings - DS Perspective
  • 64. Kafka Summit 2021 64 What? Learning Do they use it? ✅ 👍 Focusing on the function 1. All they need to know about kafka is that it’ll give them a list of events. 2. Leads to better separation of concerns: a. Can split driver code versus their logic. b. Test driven development is easy. Learnings - DS Perspective
  • 65. Kafka Summit 2021 65 What? Learning Do they use it? ✅ 👍 Focusing on the function 1. All they need to know about kafka is that it’ll give them a list of events. 2. Leads to better separation of concerns: a. Can split driver code versus their logic. b. Test driven development is easy. At least once processing 1. They enjoy easy error recovery; gives DS time to fix things. 2. Idempotency requirement not an issue. Learnings - DS Perspective
  • 66. Kafka Summit 2021 66 What? Learning Learnings - Platform Perspective (1/2)
  • 67. Kafka Summit 2021 67 What? Learning Writing back via proxy service 1. Helped early on with some minor message format adjustments & validation. 2. Would recommend writing back directly if we were to start again. a. Writing back directly leads to better performance. Learnings - Platform Perspective (1/2)
  • 68. Kafka Summit 2021 68 What? Learning Writing back via proxy service 1. Helped early on with some minor message format adjustments & validation. 2. Would recommend writing back directly if we were to start again. a. Writing back directly leads to better performance. Central place for all things kafka Very useful to have a central place to: 1. Understand topics & topic contents. 2. Having “off the shelf” ability to materialize stream to a datastore removed need for DS to manage/optimize this process. E.g. elasticsearch, data warehouse, feature store. Learnings - Platform Perspective (1/2)
  • 69. Kafka Summit 2021 69 What? Learning Using internal async libraries Using internal asyncio libs is cumbersome for DS. Native asyncio framework would feel better.* Learnings - Platform Perspective (2/2) * we ended up creating a very narrow focused micro-framework addressing these two issues using aiokafka.
  • 70. Kafka Summit 2021 70 What? Learning Using internal async libraries Using internal asyncio libs is cumbersome for DS. Native asyncio framework would feel better.* Lineage & Lineage Impacts If there is a chain of consumers*, didn’t have easy introspection into: ● Processing speed of full chain ● Knowing what the chain was Learnings - Platform Perspective (2/2) * we ended up creating a very narrow focused micro-framework addressing these two issues using aiokafka.
  • 71. Kafka Summit 2021 71 What? Why? Being able to replace different subcomponents & assumptions of the system more easily. Cleaner abstractions & modularity: ● Want to remove leaking business logic into engine. ● Making parts pluggable means we can easily change/swap out e.g. schema validation, or serialization format, or how we write back to kafka, processing assumptions, support asyncio, etc. Future Directions
  • 72. Kafka Summit 2021 72 What? Why? Being able to replace different subcomponents & assumptions of the system more easily. Cleaner abstractions & modularity: ● Want to remove leaking business logic into engine. ● Making parts pluggable means we can easily change/swap out e.g. schema validation, or serialization format, or how we write back to kafka, processing assumptions, support asyncio, etc. Exploring stream processing like kafka streams & faust. Streaming processing over windows is slowly becoming something more DS ask about. Future Directions
  • 73. Kafka Summit 2021 73 What? Why? Being able to replace different subcomponents & assumptions of the system more easily. Cleaner abstractions & modularity: ● Want to remove leaking business logic into engine. ● Making parts pluggable means we can easily change/swap out e.g. schema validation, or serialization format, or how we write back to kafka, processing assumptions, support asyncio, etc. Exploring stream processing like kafka streams & faust. Streaming processing over windows is slowly becoming something more DS ask about. Writing an open source version Hypothesis that this is valuable and that the community would be interested; would you be? Future Directions
  • 75. Kafka Summit 2021 75 TL;DR: Summary Kafka + Data Scientists @ Stitch Fix: ● We have a self-service platform for Data Scientists to deploy kafka consumers ● We achieve self-service through a separation of concerns: ○ Data Scientists focus on functions to process events ○ Data Platform provides guardrails for kafka operations
  翻译: