SlideShare a Scribd company logo
11
Is that a Time Machine?
Some Design Patterns for Real-World Machine Learning Systems
Justin Basilico
Page Algorithms Engineering
ICML ML Systems Workshop
June 24, 2016
@JustinBasilico
DeLorean image by JMortonPhoto.com & OtoGodfrey.com
22
Introduction
3
Focus
2006 2016
4
Netflix Scale
 > 81M members
 > 190 countries
 > 1000 device types
 > 3B hours/month
 > 36% of peak US
downstream traffic
5
Goal
Help members find content to watch and enjoy
to maximize member satisfaction and retention
6
Machine Learning is Everywhere
Rows
Ranking
Over 80% of what
people watch
comes from our
recommendations
7
Models & Algorithms
 Regression (linear, logistic, elastic net)
 SVD and other Matrix Factorizations
 Factorization Machines
 Restricted Boltzmann Machines
 Deep Neural Networks
 Markov Models and Graph Algorithms
 Clustering
 Latent Dirichlet Allocation
 Gradient Boosted Decision
Trees/Random Forests
 Gaussian Processes
 …
8
Systems
 AWS Cloud
 Online:
 Microservices
 Java
 EVCache, Cassandra
 Offline:
 Hive on S3
 Spark, Docker, Meson
Netflix.Hermes
Netflix.Manhattan
Nearline
Computation
Models
Online
Data Service
Offline Data
Model
training
Online
Computation
Event Distribution
User Event
Queue
Algorithm
Service
UI Client
Member
Query results
Recommendations
NEARLINE
Machine
Learning
Algorithm
Machine
Learning
Algorithm
Offline
Computation Machine
Learning
Algorithm
Play, Rate,
Browse...
OFFLINE
ONLINE
More details on Netflix Techblog
99
Design Patterns
10
Why Design Patterns for ML Systems?
Idea Experiment Live
Problem
Problem
11
Design patterns provide…
 Common solutions to common problems
 No need to re-invent them
 A menu of approaches
 Reusable abstractions
 Transcend specific implementations
 Common terminology
 Eases communications of how something works
12
Some machine learning patterns…
 The Hulk
 The Lumberjack
 The Online Archive
 The Time Machine
 The Sentinel
 The Precog
 The Dagobah
 The Anytime Algorithm
 The Parameter Oracle
 The LEGO
 The Terminator
 The Inception
 The Feature Encoder
 The Hoarder
 The Transformer
 The Parameter Server
 The Log Space
 The Matrix Transposed
 The Overflow
 The Substitute
Thanks to: Aish Fenton, Yves Raimond, Dave Ray, Hossein Taghavi, Anuj Shah, DB Tsai, …
13
Application
Machine Learning in an Application
Machine Learning
Application
?Machine
Learned Model
Feature
Encoding
Output
Decoding
Predictor
14
Antipattern:
The Phantom Menace
(AKA Training/Serving Skew)
Different
code/data/platform
between training and
applying model
© Lucasfilm Ltd.
15
Training Pipeline
Application
“Typical” ML Pipeline: A tale of two worlds
Historical
Data
Generate
Features
Train Models
Validate &
Select Models
Application
Logic
Live
Data
Load
Model
Offline
Online
Collect Labels
Publish
Model
Evaluate
Model
Experimentation
16
The Sentinel
Validate model/data in
online environment before
letting it go live“You shall not pass!”
© New Line Cinema
17
Sentinel
Service
Application
Sentinel: Structure
Model
Model
Publisher
Model
Loader
Model Loader
Model
Validator
Offline
Online
Alert
Republish
Some potential checks:
• File format is valid
• Dependent data is available
• Accuracy on shadow live data
• Feature distributions match
• Output is properly calibrated
18
Sentinel
 Example: Checking that new ranking model is valid and
performs better than previous one
 Pros:
 Using a model requires both code and data are available
 Models may need to be versioned along-side code changes
 Ensure that a new model is no worse than previous one
 Cons:
 Sentinel needs to be in sync with application code
 Difficult to choose failure thresholds for data-based checks
19
The Hulk
(AKA Offline Precompute)
Train and evaluate your
full model offline then
publish final outputs
Scale for production
by batching
and brute force
© Disney
© Disney
20
Offline Precompute: Example Structure
Application
Cache
Historical
Data
OfflineOnline
Model Evaluation
Predictor
Data
Publisher
Generate
Features
Decode
Output
lookup
key -> output
save
21
Offline Precompute (aka The Hulk)
 Example: Computing unpersonalized video-to-video similarities
 Pros:
 Easy to set up based on experiment code
 Decouples implementation from online platform
 Can use more computationally expensive models
 Cons:
 Can’t depend on online facts or fresh data
 May have data gaps (e.g. handling new videos, users, etc.)
 May require cleanup to make consistent with online data
 Model output based on offline data; may not be properly calibrated
22
The Lumberjack
(AKA Feature Logging)
Train model on features
logged online from within
an application
Image via YouTube
23
Application
Feature Logging: Structure
Live
Data
Feature
Log Train Models
Predictor
Labels
log
id
Feature
Config
Generate
Features
Decode
Output
Model Evaluation
Offline
Online
24
 Example: Features of pages, rows, and videos in page generation
 Pros:
 Train on features exactly as seen online
 Easy to deploy trained model
 Can include impact of up-stream application logic
 Cons:
 Requires production-grade feature code and deployment
 Takes time to log enough data
 Need all dependent data also in production
 Adds risk to production servers for experimental features
 Feature data can be large; may require sampling
Feature Logging (aka The Lumberjack)
25
The Online Archive
Have online services save
history and expose to
offline systems via batch
interface
© Lucasfilm Ltd.
26
Online Archive: Structure
Live +
Historical
Data
Generate
Features
Collect Labels
Offline
Online
Application
Train Model
batch
interface
live
interface
27
Online Archive
 Example: Filtering online viewing history
 Pros:
 Provides access to online view of data at any time
 Can experiment with new features
 Cons:
 All dependent data needs to keep track of all history
 Only works for small data
 Requires batch interface also available within application
 May be other processes that edit history (e.g. slow arriving events)
 Service needs to handle two very different request loads so batch queries
don’t bring down the live system
28
The Time Machine
Snapshot facts and share
feature generation code
DeLorean image by JMortonPhoto.com & OtoGodfrey.com
29
Application
Time Machine: Example Structure
FeaturesFact Log
Feature
Config
Predictor
Generate
Features
Decode
Output
Online
Snapshotter
Model Evaluation
Generate
Features
Labels
Data
Service
Bulk
Data
Other
Models
Live Data
30
Time Machine
 Example: Training ranking models in Spark*
 Pros:
 Easy to experiment with new features offline
 Allows testing impact of modifying non-ML components
 Can construct full application output after trying new model
 Can share snapshots across applications to help build new ones
 Cons:
 Fact data volume can be high; may require sampling
 Snapshotting requires deciding contexts to collect data for
* See http://bit.ly/sparktimetravel for more info
3131
Conclusions
32
Conclusion
 Some design patterns for avoiding online-offline discrepancies
 The Sentinel
 The Hulk
 The Lumberjack
 The Online Archive
 The Time Machine
 What useful patterns do you see for ML systems?
 Share them!
33
Thank You Justin Basilico
jbasilico@netflix.com
@JustinBasilico
Ad

More Related Content

What's hot (20)

Recent Trends in Personalization at Netflix
Recent Trends in Personalization at NetflixRecent Trends in Personalization at Netflix
Recent Trends in Personalization at Netflix
Förderverein Technische Fakultät
 
Contextualization at Netflix
Contextualization at NetflixContextualization at Netflix
Contextualization at Netflix
Linas Baltrunas
 
Recommendation at Netflix Scale
Recommendation at Netflix ScaleRecommendation at Netflix Scale
Recommendation at Netflix Scale
Justin Basilico
 
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
 Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se... Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
Sudeep Das, Ph.D.
 
Interactive Recommender Systems with Netflix and Spotify
Interactive Recommender Systems with Netflix and SpotifyInteractive Recommender Systems with Netflix and Spotify
Interactive Recommender Systems with Netflix and Spotify
Chris Johnson
 
Homepage Personalization at Spotify
Homepage Personalization at SpotifyHomepage Personalization at Spotify
Homepage Personalization at Spotify
Oguz Semerci
 
Artwork Personalization at Netflix
Artwork Personalization at NetflixArtwork Personalization at Netflix
Artwork Personalization at Netflix
Justin Basilico
 
Time, Context and Causality in Recommender Systems
Time, Context and Causality in Recommender SystemsTime, Context and Causality in Recommender Systems
Time, Context and Causality in Recommender Systems
Yves Raimond
 
Missing values in recommender models
Missing values in recommender modelsMissing values in recommender models
Missing values in recommender models
Parmeshwar Khurd
 
Making Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms ReliableMaking Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms Reliable
Justin Basilico
 
Search, Discovery and Questions at Quora
Search, Discovery and Questions at QuoraSearch, Discovery and Questions at Quora
Search, Discovery and Questions at Quora
Nikhil Dandekar
 
Déjà Vu: The Importance of Time and Causality in Recommender Systems
Déjà Vu: The Importance of Time and Causality in Recommender SystemsDéjà Vu: The Importance of Time and Causality in Recommender Systems
Déjà Vu: The Importance of Time and Causality in Recommender Systems
Justin Basilico
 
Personalizing "The Netflix Experience" with Deep Learning
Personalizing "The Netflix Experience" with Deep LearningPersonalizing "The Netflix Experience" with Deep Learning
Personalizing "The Netflix Experience" with Deep Learning
Anoop Deoras
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
Yves Raimond
 
Context Aware Recommendations at Netflix
Context Aware Recommendations at NetflixContext Aware Recommendations at Netflix
Context Aware Recommendations at Netflix
Linas Baltrunas
 
Learning to Personalize
Learning to PersonalizeLearning to Personalize
Learning to Personalize
Justin Basilico
 
Sequential Decision Making in Recommendations
Sequential Decision Making in RecommendationsSequential Decision Making in Recommendations
Sequential Decision Making in Recommendations
Jaya Kawale
 
Machine Learning at Netflix Scale
Machine Learning at Netflix ScaleMachine Learning at Netflix Scale
Machine Learning at Netflix Scale
Aish Fenton
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
Justin Basilico
 
Netflix Recommendations Feature Engineering with Time Travel
Netflix Recommendations Feature Engineering with Time TravelNetflix Recommendations Feature Engineering with Time Travel
Netflix Recommendations Feature Engineering with Time Travel
Faisal Siddiqi
 
Contextualization at Netflix
Contextualization at NetflixContextualization at Netflix
Contextualization at Netflix
Linas Baltrunas
 
Recommendation at Netflix Scale
Recommendation at Netflix ScaleRecommendation at Netflix Scale
Recommendation at Netflix Scale
Justin Basilico
 
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
 Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se... Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
Sudeep Das, Ph.D.
 
Interactive Recommender Systems with Netflix and Spotify
Interactive Recommender Systems with Netflix and SpotifyInteractive Recommender Systems with Netflix and Spotify
Interactive Recommender Systems with Netflix and Spotify
Chris Johnson
 
Homepage Personalization at Spotify
Homepage Personalization at SpotifyHomepage Personalization at Spotify
Homepage Personalization at Spotify
Oguz Semerci
 
Artwork Personalization at Netflix
Artwork Personalization at NetflixArtwork Personalization at Netflix
Artwork Personalization at Netflix
Justin Basilico
 
Time, Context and Causality in Recommender Systems
Time, Context and Causality in Recommender SystemsTime, Context and Causality in Recommender Systems
Time, Context and Causality in Recommender Systems
Yves Raimond
 
Missing values in recommender models
Missing values in recommender modelsMissing values in recommender models
Missing values in recommender models
Parmeshwar Khurd
 
Making Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms ReliableMaking Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms Reliable
Justin Basilico
 
Search, Discovery and Questions at Quora
Search, Discovery and Questions at QuoraSearch, Discovery and Questions at Quora
Search, Discovery and Questions at Quora
Nikhil Dandekar
 
Déjà Vu: The Importance of Time and Causality in Recommender Systems
Déjà Vu: The Importance of Time and Causality in Recommender SystemsDéjà Vu: The Importance of Time and Causality in Recommender Systems
Déjà Vu: The Importance of Time and Causality in Recommender Systems
Justin Basilico
 
Personalizing "The Netflix Experience" with Deep Learning
Personalizing "The Netflix Experience" with Deep LearningPersonalizing "The Netflix Experience" with Deep Learning
Personalizing "The Netflix Experience" with Deep Learning
Anoop Deoras
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
Yves Raimond
 
Context Aware Recommendations at Netflix
Context Aware Recommendations at NetflixContext Aware Recommendations at Netflix
Context Aware Recommendations at Netflix
Linas Baltrunas
 
Sequential Decision Making in Recommendations
Sequential Decision Making in RecommendationsSequential Decision Making in Recommendations
Sequential Decision Making in Recommendations
Jaya Kawale
 
Machine Learning at Netflix Scale
Machine Learning at Netflix ScaleMachine Learning at Netflix Scale
Machine Learning at Netflix Scale
Aish Fenton
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
Justin Basilico
 
Netflix Recommendations Feature Engineering with Time Travel
Netflix Recommendations Feature Engineering with Time TravelNetflix Recommendations Feature Engineering with Time Travel
Netflix Recommendations Feature Engineering with Time Travel
Faisal Siddiqi
 

Similar to Is that a Time Machine? Some Design Patterns for Real World Machine Learning Systems (20)

Recommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareRecommendations for Building Machine Learning Software
Recommendations for Building Machine Learning Software
Justin Basilico
 
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
MLconf
 
Recommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareRecommendations for Building Machine Learning Software
Recommendations for Building Machine Learning Software
Justin Basilico
 
Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...
Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...
Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...
Kai Wähner
 
Serverless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaServerless machine learning architectures at Helixa
Serverless machine learning architectures at Helixa
Data Science Milan
 
Best Practices for Streaming IoT Data with MQTT and Apache Kafka®
Best Practices for Streaming IoT Data with MQTT and Apache Kafka®Best Practices for Streaming IoT Data with MQTT and Apache Kafka®
Best Practices for Streaming IoT Data with MQTT and Apache Kafka®
confluent
 
JUNIPER: Towards Modeling Approach Enabling Efficient Platform for Heterogene...
JUNIPER: Towards Modeling Approach Enabling Efficient Platform for Heterogene...JUNIPER: Towards Modeling Approach Enabling Efficient Platform for Heterogene...
JUNIPER: Towards Modeling Approach Enabling Efficient Platform for Heterogene...
Andrey Sadovykh
 
The Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMeshThe Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
IanFurlong4
 
Best Practices for Streaming IoT Data with MQTT and Apache Kafka
Best Practices for Streaming IoT Data with MQTT and Apache KafkaBest Practices for Streaming IoT Data with MQTT and Apache Kafka
Best Practices for Streaming IoT Data with MQTT and Apache Kafka
Kai Wähner
 
Viele Autos, noch mehr Daten: IoT-Daten-Streaming mit MQTT & Kafka (Kai Waehn...
Viele Autos, noch mehr Daten: IoT-Daten-Streaming mit MQTT & Kafka (Kai Waehn...Viele Autos, noch mehr Daten: IoT-Daten-Streaming mit MQTT & Kafka (Kai Waehn...
Viele Autos, noch mehr Daten: IoT-Daten-Streaming mit MQTT & Kafka (Kai Waehn...
confluent
 
Icon solutions presentation - Pure Hybrid Cloud Event, 11th September London
Icon solutions presentation - Pure Hybrid Cloud Event, 11th September LondonIcon solutions presentation - Pure Hybrid Cloud Event, 11th September London
Icon solutions presentation - Pure Hybrid Cloud Event, 11th September London
IBM Systems UKI
 
2020 09-16-ai-engineering challanges
2020 09-16-ai-engineering challanges2020 09-16-ai-engineering challanges
2020 09-16-ai-engineering challanges
Ivica Crnkovic
 
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
confluent
 
Graphical Data Analytic Workflows and Cross-Platform Optimization
Graphical Data Analytic Workflows and Cross-Platform OptimizationGraphical Data Analytic Workflows and Cross-Platform Optimization
Graphical Data Analytic Workflows and Cross-Platform Optimization
Big Data Value Association
 
Unleashing Apache Kafka and TensorFlow in the Cloud

Unleashing Apache Kafka and TensorFlow in the Cloud
Unleashing Apache Kafka and TensorFlow in the Cloud

Unleashing Apache Kafka and TensorFlow in the Cloud

Kai Wähner
 
Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...
Sri Ambati
 
Incremental Queries and Transformations for Engineering Critical Systems
Incremental Queries and Transformations for Engineering Critical SystemsIncremental Queries and Transformations for Engineering Critical Systems
Incremental Queries and Transformations for Engineering Critical Systems
Ákos Horváth
 
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
Databricks
 
Incquery Suite Models 2020 Conference by István Ráth, CEO of IncQuery Labs
Incquery Suite Models 2020 Conference by István Ráth, CEO of IncQuery LabsIncquery Suite Models 2020 Conference by István Ráth, CEO of IncQuery Labs
Incquery Suite Models 2020 Conference by István Ráth, CEO of IncQuery Labs
IncQuery Labs
 
Model driven engineering for big data management systems
Model driven engineering for big data management systemsModel driven engineering for big data management systems
Model driven engineering for big data management systems
Marcos Almeida
 
Recommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareRecommendations for Building Machine Learning Software
Recommendations for Building Machine Learning Software
Justin Basilico
 
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
MLconf
 
Recommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareRecommendations for Building Machine Learning Software
Recommendations for Building Machine Learning Software
Justin Basilico
 
Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...
Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...
Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...
Kai Wähner
 
Serverless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaServerless machine learning architectures at Helixa
Serverless machine learning architectures at Helixa
Data Science Milan
 
Best Practices for Streaming IoT Data with MQTT and Apache Kafka®
Best Practices for Streaming IoT Data with MQTT and Apache Kafka®Best Practices for Streaming IoT Data with MQTT and Apache Kafka®
Best Practices for Streaming IoT Data with MQTT and Apache Kafka®
confluent
 
JUNIPER: Towards Modeling Approach Enabling Efficient Platform for Heterogene...
JUNIPER: Towards Modeling Approach Enabling Efficient Platform for Heterogene...JUNIPER: Towards Modeling Approach Enabling Efficient Platform for Heterogene...
JUNIPER: Towards Modeling Approach Enabling Efficient Platform for Heterogene...
Andrey Sadovykh
 
The Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMeshThe Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
IanFurlong4
 
Best Practices for Streaming IoT Data with MQTT and Apache Kafka
Best Practices for Streaming IoT Data with MQTT and Apache KafkaBest Practices for Streaming IoT Data with MQTT and Apache Kafka
Best Practices for Streaming IoT Data with MQTT and Apache Kafka
Kai Wähner
 
Viele Autos, noch mehr Daten: IoT-Daten-Streaming mit MQTT & Kafka (Kai Waehn...
Viele Autos, noch mehr Daten: IoT-Daten-Streaming mit MQTT & Kafka (Kai Waehn...Viele Autos, noch mehr Daten: IoT-Daten-Streaming mit MQTT & Kafka (Kai Waehn...
Viele Autos, noch mehr Daten: IoT-Daten-Streaming mit MQTT & Kafka (Kai Waehn...
confluent
 
Icon solutions presentation - Pure Hybrid Cloud Event, 11th September London
Icon solutions presentation - Pure Hybrid Cloud Event, 11th September LondonIcon solutions presentation - Pure Hybrid Cloud Event, 11th September London
Icon solutions presentation - Pure Hybrid Cloud Event, 11th September London
IBM Systems UKI
 
2020 09-16-ai-engineering challanges
2020 09-16-ai-engineering challanges2020 09-16-ai-engineering challanges
2020 09-16-ai-engineering challanges
Ivica Crnkovic
 
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
confluent
 
Graphical Data Analytic Workflows and Cross-Platform Optimization
Graphical Data Analytic Workflows and Cross-Platform OptimizationGraphical Data Analytic Workflows and Cross-Platform Optimization
Graphical Data Analytic Workflows and Cross-Platform Optimization
Big Data Value Association
 
Unleashing Apache Kafka and TensorFlow in the Cloud

Unleashing Apache Kafka and TensorFlow in the Cloud
Unleashing Apache Kafka and TensorFlow in the Cloud

Unleashing Apache Kafka and TensorFlow in the Cloud

Kai Wähner
 
Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...
Sri Ambati
 
Incremental Queries and Transformations for Engineering Critical Systems
Incremental Queries and Transformations for Engineering Critical SystemsIncremental Queries and Transformations for Engineering Critical Systems
Incremental Queries and Transformations for Engineering Critical Systems
Ákos Horváth
 
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
Databricks
 
Incquery Suite Models 2020 Conference by István Ráth, CEO of IncQuery Labs
Incquery Suite Models 2020 Conference by István Ráth, CEO of IncQuery LabsIncquery Suite Models 2020 Conference by István Ráth, CEO of IncQuery Labs
Incquery Suite Models 2020 Conference by István Ráth, CEO of IncQuery Labs
IncQuery Labs
 
Model driven engineering for big data management systems
Model driven engineering for big data management systemsModel driven engineering for big data management systems
Model driven engineering for big data management systems
Marcos Almeida
 
Ad

Recently uploaded (20)

RTP Over QUIC: An Interesting Opportunity Or Wasted Time?
RTP Over QUIC: An Interesting Opportunity Or Wasted Time?RTP Over QUIC: An Interesting Opportunity Or Wasted Time?
RTP Over QUIC: An Interesting Opportunity Or Wasted Time?
Lorenzo Miniero
 
The No-Code Way to Build a Marketing Team with One AI Agent (Download the n8n...
The No-Code Way to Build a Marketing Team with One AI Agent (Download the n8n...The No-Code Way to Build a Marketing Team with One AI Agent (Download the n8n...
The No-Code Way to Build a Marketing Team with One AI Agent (Download the n8n...
SOFTTECHHUB
 
UiPath Automation Suite – Cas d'usage d'une NGO internationale basée à Genève
UiPath Automation Suite – Cas d'usage d'une NGO internationale basée à GenèveUiPath Automation Suite – Cas d'usage d'une NGO internationale basée à Genève
UiPath Automation Suite – Cas d'usage d'une NGO internationale basée à Genève
UiPathCommunity
 
The Future of Cisco Cloud Security: Innovations and AI Integration
The Future of Cisco Cloud Security: Innovations and AI IntegrationThe Future of Cisco Cloud Security: Innovations and AI Integration
The Future of Cisco Cloud Security: Innovations and AI Integration
Re-solution Data Ltd
 
Viam product demo_ Deploying and scaling AI with hardware.pdf
Viam product demo_ Deploying and scaling AI with hardware.pdfViam product demo_ Deploying and scaling AI with hardware.pdf
Viam product demo_ Deploying and scaling AI with hardware.pdf
camilalamoratta
 
Config 2025 presentation recap covering both days
Config 2025 presentation recap covering both daysConfig 2025 presentation recap covering both days
Config 2025 presentation recap covering both days
TrishAntoni1
 
Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...
Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...
Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...
Raffi Khatchadourian
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
Kit-Works Team Study_아직도 Dockefile.pdf_김성호
Kit-Works Team Study_아직도 Dockefile.pdf_김성호Kit-Works Team Study_아직도 Dockefile.pdf_김성호
Kit-Works Team Study_아직도 Dockefile.pdf_김성호
Wonjun Hwang
 
Integrating FME with Python: Tips, Demos, and Best Practices for Powerful Aut...
Integrating FME with Python: Tips, Demos, and Best Practices for Powerful Aut...Integrating FME with Python: Tips, Demos, and Best Practices for Powerful Aut...
Integrating FME with Python: Tips, Demos, and Best Practices for Powerful Aut...
Safe Software
 
Bepents tech services - a premier cybersecurity consulting firm
Bepents tech services - a premier cybersecurity consulting firmBepents tech services - a premier cybersecurity consulting firm
Bepents tech services - a premier cybersecurity consulting firm
Benard76
 
Web and Graphics Designing Training in Rajpura
Web and Graphics Designing Training in RajpuraWeb and Graphics Designing Training in Rajpura
Web and Graphics Designing Training in Rajpura
Erginous Technology
 
AI Agents at Work: UiPath, Maestro & the Future of Documents
AI Agents at Work: UiPath, Maestro & the Future of DocumentsAI Agents at Work: UiPath, Maestro & the Future of Documents
AI Agents at Work: UiPath, Maestro & the Future of Documents
UiPathCommunity
 
Optima Cyber - Maritime Cyber Security - MSSP Services - Manolis Sfakianakis ...
Optima Cyber - Maritime Cyber Security - MSSP Services - Manolis Sfakianakis ...Optima Cyber - Maritime Cyber Security - MSSP Services - Manolis Sfakianakis ...
Optima Cyber - Maritime Cyber Security - MSSP Services - Manolis Sfakianakis ...
Mike Mingos
 
Enterprise Integration Is Dead! Long Live AI-Driven Integration with Apache C...
Enterprise Integration Is Dead! Long Live AI-Driven Integration with Apache C...Enterprise Integration Is Dead! Long Live AI-Driven Integration with Apache C...
Enterprise Integration Is Dead! Long Live AI-Driven Integration with Apache C...
Markus Eisele
 
AI 3-in-1: Agents, RAG, and Local Models - Brent Laster
AI 3-in-1: Agents, RAG, and Local Models - Brent LasterAI 3-in-1: Agents, RAG, and Local Models - Brent Laster
AI 3-in-1: Agents, RAG, and Local Models - Brent Laster
All Things Open
 
Unlocking Generative AI in your Web Apps
Unlocking Generative AI in your Web AppsUnlocking Generative AI in your Web Apps
Unlocking Generative AI in your Web Apps
Maximiliano Firtman
 
DevOpsDays SLC - Platform Engineers are Product Managers.pptx
DevOpsDays SLC - Platform Engineers are Product Managers.pptxDevOpsDays SLC - Platform Engineers are Product Managers.pptx
DevOpsDays SLC - Platform Engineers are Product Managers.pptx
Justin Reock
 
Zilliz Cloud Monthly Technical Review: May 2025
Zilliz Cloud Monthly Technical Review: May 2025Zilliz Cloud Monthly Technical Review: May 2025
Zilliz Cloud Monthly Technical Review: May 2025
Zilliz
 
Smart Investments Leveraging Agentic AI for Real Estate Success.pptx
Smart Investments Leveraging Agentic AI for Real Estate Success.pptxSmart Investments Leveraging Agentic AI for Real Estate Success.pptx
Smart Investments Leveraging Agentic AI for Real Estate Success.pptx
Seasia Infotech
 
RTP Over QUIC: An Interesting Opportunity Or Wasted Time?
RTP Over QUIC: An Interesting Opportunity Or Wasted Time?RTP Over QUIC: An Interesting Opportunity Or Wasted Time?
RTP Over QUIC: An Interesting Opportunity Or Wasted Time?
Lorenzo Miniero
 
The No-Code Way to Build a Marketing Team with One AI Agent (Download the n8n...
The No-Code Way to Build a Marketing Team with One AI Agent (Download the n8n...The No-Code Way to Build a Marketing Team with One AI Agent (Download the n8n...
The No-Code Way to Build a Marketing Team with One AI Agent (Download the n8n...
SOFTTECHHUB
 
UiPath Automation Suite – Cas d'usage d'une NGO internationale basée à Genève
UiPath Automation Suite – Cas d'usage d'une NGO internationale basée à GenèveUiPath Automation Suite – Cas d'usage d'une NGO internationale basée à Genève
UiPath Automation Suite – Cas d'usage d'une NGO internationale basée à Genève
UiPathCommunity
 
The Future of Cisco Cloud Security: Innovations and AI Integration
The Future of Cisco Cloud Security: Innovations and AI IntegrationThe Future of Cisco Cloud Security: Innovations and AI Integration
The Future of Cisco Cloud Security: Innovations and AI Integration
Re-solution Data Ltd
 
Viam product demo_ Deploying and scaling AI with hardware.pdf
Viam product demo_ Deploying and scaling AI with hardware.pdfViam product demo_ Deploying and scaling AI with hardware.pdf
Viam product demo_ Deploying and scaling AI with hardware.pdf
camilalamoratta
 
Config 2025 presentation recap covering both days
Config 2025 presentation recap covering both daysConfig 2025 presentation recap covering both days
Config 2025 presentation recap covering both days
TrishAntoni1
 
Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...
Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...
Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...
Raffi Khatchadourian
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
Kit-Works Team Study_아직도 Dockefile.pdf_김성호
Kit-Works Team Study_아직도 Dockefile.pdf_김성호Kit-Works Team Study_아직도 Dockefile.pdf_김성호
Kit-Works Team Study_아직도 Dockefile.pdf_김성호
Wonjun Hwang
 
Integrating FME with Python: Tips, Demos, and Best Practices for Powerful Aut...
Integrating FME with Python: Tips, Demos, and Best Practices for Powerful Aut...Integrating FME with Python: Tips, Demos, and Best Practices for Powerful Aut...
Integrating FME with Python: Tips, Demos, and Best Practices for Powerful Aut...
Safe Software
 
Bepents tech services - a premier cybersecurity consulting firm
Bepents tech services - a premier cybersecurity consulting firmBepents tech services - a premier cybersecurity consulting firm
Bepents tech services - a premier cybersecurity consulting firm
Benard76
 
Web and Graphics Designing Training in Rajpura
Web and Graphics Designing Training in RajpuraWeb and Graphics Designing Training in Rajpura
Web and Graphics Designing Training in Rajpura
Erginous Technology
 
AI Agents at Work: UiPath, Maestro & the Future of Documents
AI Agents at Work: UiPath, Maestro & the Future of DocumentsAI Agents at Work: UiPath, Maestro & the Future of Documents
AI Agents at Work: UiPath, Maestro & the Future of Documents
UiPathCommunity
 
Optima Cyber - Maritime Cyber Security - MSSP Services - Manolis Sfakianakis ...
Optima Cyber - Maritime Cyber Security - MSSP Services - Manolis Sfakianakis ...Optima Cyber - Maritime Cyber Security - MSSP Services - Manolis Sfakianakis ...
Optima Cyber - Maritime Cyber Security - MSSP Services - Manolis Sfakianakis ...
Mike Mingos
 
Enterprise Integration Is Dead! Long Live AI-Driven Integration with Apache C...
Enterprise Integration Is Dead! Long Live AI-Driven Integration with Apache C...Enterprise Integration Is Dead! Long Live AI-Driven Integration with Apache C...
Enterprise Integration Is Dead! Long Live AI-Driven Integration with Apache C...
Markus Eisele
 
AI 3-in-1: Agents, RAG, and Local Models - Brent Laster
AI 3-in-1: Agents, RAG, and Local Models - Brent LasterAI 3-in-1: Agents, RAG, and Local Models - Brent Laster
AI 3-in-1: Agents, RAG, and Local Models - Brent Laster
All Things Open
 
Unlocking Generative AI in your Web Apps
Unlocking Generative AI in your Web AppsUnlocking Generative AI in your Web Apps
Unlocking Generative AI in your Web Apps
Maximiliano Firtman
 
DevOpsDays SLC - Platform Engineers are Product Managers.pptx
DevOpsDays SLC - Platform Engineers are Product Managers.pptxDevOpsDays SLC - Platform Engineers are Product Managers.pptx
DevOpsDays SLC - Platform Engineers are Product Managers.pptx
Justin Reock
 
Zilliz Cloud Monthly Technical Review: May 2025
Zilliz Cloud Monthly Technical Review: May 2025Zilliz Cloud Monthly Technical Review: May 2025
Zilliz Cloud Monthly Technical Review: May 2025
Zilliz
 
Smart Investments Leveraging Agentic AI for Real Estate Success.pptx
Smart Investments Leveraging Agentic AI for Real Estate Success.pptxSmart Investments Leveraging Agentic AI for Real Estate Success.pptx
Smart Investments Leveraging Agentic AI for Real Estate Success.pptx
Seasia Infotech
 
Ad

Is that a Time Machine? Some Design Patterns for Real World Machine Learning Systems

  • 1. 11 Is that a Time Machine? Some Design Patterns for Real-World Machine Learning Systems Justin Basilico Page Algorithms Engineering ICML ML Systems Workshop June 24, 2016 @JustinBasilico DeLorean image by JMortonPhoto.com & OtoGodfrey.com
  • 4. 4 Netflix Scale  > 81M members  > 190 countries  > 1000 device types  > 3B hours/month  > 36% of peak US downstream traffic
  • 5. 5 Goal Help members find content to watch and enjoy to maximize member satisfaction and retention
  • 6. 6 Machine Learning is Everywhere Rows Ranking Over 80% of what people watch comes from our recommendations
  • 7. 7 Models & Algorithms  Regression (linear, logistic, elastic net)  SVD and other Matrix Factorizations  Factorization Machines  Restricted Boltzmann Machines  Deep Neural Networks  Markov Models and Graph Algorithms  Clustering  Latent Dirichlet Allocation  Gradient Boosted Decision Trees/Random Forests  Gaussian Processes  …
  • 8. 8 Systems  AWS Cloud  Online:  Microservices  Java  EVCache, Cassandra  Offline:  Hive on S3  Spark, Docker, Meson Netflix.Hermes Netflix.Manhattan Nearline Computation Models Online Data Service Offline Data Model training Online Computation Event Distribution User Event Queue Algorithm Service UI Client Member Query results Recommendations NEARLINE Machine Learning Algorithm Machine Learning Algorithm Offline Computation Machine Learning Algorithm Play, Rate, Browse... OFFLINE ONLINE More details on Netflix Techblog
  • 10. 10 Why Design Patterns for ML Systems? Idea Experiment Live Problem Problem
  • 11. 11 Design patterns provide…  Common solutions to common problems  No need to re-invent them  A menu of approaches  Reusable abstractions  Transcend specific implementations  Common terminology  Eases communications of how something works
  • 12. 12 Some machine learning patterns…  The Hulk  The Lumberjack  The Online Archive  The Time Machine  The Sentinel  The Precog  The Dagobah  The Anytime Algorithm  The Parameter Oracle  The LEGO  The Terminator  The Inception  The Feature Encoder  The Hoarder  The Transformer  The Parameter Server  The Log Space  The Matrix Transposed  The Overflow  The Substitute Thanks to: Aish Fenton, Yves Raimond, Dave Ray, Hossein Taghavi, Anuj Shah, DB Tsai, …
  • 13. 13 Application Machine Learning in an Application Machine Learning Application ?Machine Learned Model Feature Encoding Output Decoding Predictor
  • 14. 14 Antipattern: The Phantom Menace (AKA Training/Serving Skew) Different code/data/platform between training and applying model © Lucasfilm Ltd.
  • 15. 15 Training Pipeline Application “Typical” ML Pipeline: A tale of two worlds Historical Data Generate Features Train Models Validate & Select Models Application Logic Live Data Load Model Offline Online Collect Labels Publish Model Evaluate Model Experimentation
  • 16. 16 The Sentinel Validate model/data in online environment before letting it go live“You shall not pass!” © New Line Cinema
  • 17. 17 Sentinel Service Application Sentinel: Structure Model Model Publisher Model Loader Model Loader Model Validator Offline Online Alert Republish Some potential checks: • File format is valid • Dependent data is available • Accuracy on shadow live data • Feature distributions match • Output is properly calibrated
  • 18. 18 Sentinel  Example: Checking that new ranking model is valid and performs better than previous one  Pros:  Using a model requires both code and data are available  Models may need to be versioned along-side code changes  Ensure that a new model is no worse than previous one  Cons:  Sentinel needs to be in sync with application code  Difficult to choose failure thresholds for data-based checks
  • 19. 19 The Hulk (AKA Offline Precompute) Train and evaluate your full model offline then publish final outputs Scale for production by batching and brute force © Disney © Disney
  • 20. 20 Offline Precompute: Example Structure Application Cache Historical Data OfflineOnline Model Evaluation Predictor Data Publisher Generate Features Decode Output lookup key -> output save
  • 21. 21 Offline Precompute (aka The Hulk)  Example: Computing unpersonalized video-to-video similarities  Pros:  Easy to set up based on experiment code  Decouples implementation from online platform  Can use more computationally expensive models  Cons:  Can’t depend on online facts or fresh data  May have data gaps (e.g. handling new videos, users, etc.)  May require cleanup to make consistent with online data  Model output based on offline data; may not be properly calibrated
  • 22. 22 The Lumberjack (AKA Feature Logging) Train model on features logged online from within an application Image via YouTube
  • 23. 23 Application Feature Logging: Structure Live Data Feature Log Train Models Predictor Labels log id Feature Config Generate Features Decode Output Model Evaluation Offline Online
  • 24. 24  Example: Features of pages, rows, and videos in page generation  Pros:  Train on features exactly as seen online  Easy to deploy trained model  Can include impact of up-stream application logic  Cons:  Requires production-grade feature code and deployment  Takes time to log enough data  Need all dependent data also in production  Adds risk to production servers for experimental features  Feature data can be large; may require sampling Feature Logging (aka The Lumberjack)
  • 25. 25 The Online Archive Have online services save history and expose to offline systems via batch interface © Lucasfilm Ltd.
  • 26. 26 Online Archive: Structure Live + Historical Data Generate Features Collect Labels Offline Online Application Train Model batch interface live interface
  • 27. 27 Online Archive  Example: Filtering online viewing history  Pros:  Provides access to online view of data at any time  Can experiment with new features  Cons:  All dependent data needs to keep track of all history  Only works for small data  Requires batch interface also available within application  May be other processes that edit history (e.g. slow arriving events)  Service needs to handle two very different request loads so batch queries don’t bring down the live system
  • 28. 28 The Time Machine Snapshot facts and share feature generation code DeLorean image by JMortonPhoto.com & OtoGodfrey.com
  • 29. 29 Application Time Machine: Example Structure FeaturesFact Log Feature Config Predictor Generate Features Decode Output Online Snapshotter Model Evaluation Generate Features Labels Data Service Bulk Data Other Models Live Data
  • 30. 30 Time Machine  Example: Training ranking models in Spark*  Pros:  Easy to experiment with new features offline  Allows testing impact of modifying non-ML components  Can construct full application output after trying new model  Can share snapshots across applications to help build new ones  Cons:  Fact data volume can be high; may require sampling  Snapshotting requires deciding contexts to collect data for * See http://bit.ly/sparktimetravel for more info
  • 32. 32 Conclusion  Some design patterns for avoiding online-offline discrepancies  The Sentinel  The Hulk  The Lumberjack  The Online Archive  The Time Machine  What useful patterns do you see for ML systems?  Share them!
  • 33. 33 Thank You Justin Basilico jbasilico@netflix.com @JustinBasilico
  翻译: