SlideShare a Scribd company logo
Making Netflix Machine
Learning Algorithms
Reliable
Tony Jebara & Justin Basilico
ICML Reliable ML in the Wild Workshop
2017-08-11
2
2006
6M members
US only
3
2017
> 100M members
> 190 countries
Goal:
Help members find content to watch and enjoy to
maximize member satisfaction and retention
Algorithm Areas
▪ Personalized Ranking
▪ Top-N Ranking
▪ Trending Now
▪ Continue Watching
▪ Video-Video Similarity
▪ Personalized Page Generation
▪ Search
▪ Personalized Image Selection
▪ ...
Models & Algorithms
▪ Regression (linear, logistic, ...)
▪ Matrix Factorization
▪ Factorization Machines
▪ Clustering & Topic Models
▪ Bayesian Nonparametrics
▪ Tree Ensembles (RF, GBDTs, …)
▪ Neural Networks (Deep, RBMs, …)
▪ Gaussian Processes
▪ Bandits
▪ …
A/B tests validate an overall approach works in
expectation
But they run in online production, so every A/B
tested model needs to be reliable
Innovation Cycle
Idea
Offline
Experiment
Full Online
Deployment
Online A/B
Test
1) Collect massive data sets
2) Try billions of hypotheses to find* one(s) with support
*Find with computational and statistical efficiency
Batch Learning
USERS
TIME
Collect Learn A/B Test Roll-out
Data Model
A
B
Batch Learning
USERS
TIME
A
BREGRET
Batch Learning
Collect Learn A/B Test Roll-out
Data Model
Explore and exploit → Less regret
Helps cold start models in the wild
Maintain some exploration for nonstationarity
Adapt reliably to changing and new data
Epsilon-greedy, UCB, Thompson Sampling, etc.
USERS
TIME
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
Bandit Learning
USERS
TIME
Bandit Learning
1) Uniform population of hypotheses
2) Choose a random hypothesis h
3) Act according to h and observe outcome
4) Re-weight hypotheses
5) Go to 2
TS
Bandits for selecting images
Bandits for selecting images
Different preferences for genre/theme portrayed
Contextual bandits to personalize images
Different preferences for cast members
Contextual bandits to personalize images
Putting Machine
Learning In Production
18
Training Pipeline
Application
“Typical” ML Pipeline: A tale of two worlds
Historical
Data
Generate
Features
Train Models
Validate &
Select
Models
Application
Logic
Live
Data
Load
Model
Offline
Online
Collect
Labels
Publish
Model
Evaluate
Model
Experimentation
Offline: Model trains reliably
Online: Model performs reliably
… plus all involved systems and data
What needs to be reliable?
Detection
Response
Prevention
Reliability Approach
Reliability in Training
To be reliable, first learning must be repeatable
Automate retraining of models
Akin to continuous deployment in software engineering
How often depends on application; typically daily
Detect problems and fail fast to avoid using a bad model
Retraining
Example Training Pipeline
● Periodic retraining to refresh models
● Workflow system to manage pipeline
● Each step is a Spark or Docker job
● Each step has checks in place
○ Response: Stop workflow and send alerts
What to check?
● For both absolute and relative to previous runs...
● Offline metrics on hold-out set
● Data size for train and test
● Feature distributions
● Feature importance
● Large (unexpected) changes in output between models
● Model coverage (e.g. number of items it can predict)
● Model integrity
● Error counters
● ...
Example 1: number of samples
Numberoftrainingsamples
Time
Example 1: number of samples
Numberoftrainingsamples
Time
Alarm fires and
model is not
published due to
anomaly
Example 2: offline metrics
Liftw.r.tbaseline
Time
Example 2: offline metrics
Liftw.r.tbaseline
Time
Alarm fires and model is
not published due to
anomaly
Absolute threshold: if lift
< X then alarm also fires
Testing
- Unit testing and code coverage
- Integration testing
Improve quality and reliability of upstream data feeds
Reuse same data and code offline and online to avoid
training/serving skew
Running from multiple random seeds
Preventing offline training failures
Reliability Online
Check for invariants in inputs and outputs
Can catch many unexpected changes, bad assumptions
and engineering issues
Examples:
- Feature values are out-of-range
- Output is NaN or Infinity
- Probabilities are < 0 or > 1 or don’t sum to 1
Basic sanity checks
Online staleness checks can cover a wide range of failures in the training
and publishing
Example checks:
- How long ago was the most recent model published?
- How old is the data it was trained on?
- How old are the inputs it is consuming?
Model and input staleness
Stale
Model
Ageofmodel
Track the quality of the model
- Compare prediction to actual
behavior
- Online equivalents of offline metrics
For online learning or bandits reserve a
fraction of data for a simple policy (e.g.
epsilon-greedy) as a sanity check
Online metrics
Your model isn’t working right or your input data
is bad: Now what?
Common approaches in personalization space:
- Use previous model?
- Use previous output?
- Use simplified model or heuristic?
- If calling system is resilient: turn off that
subsystem?
Response: Graceful Degradation
Want to choose personalized images per profile
- Image lookup has O(10M) requests per second (e.g.
“House of Cards” -> Image URL)
Approach:
- Precompute show to image mapping per user
near-line system
- Store mapping in fast distributed cache
- At request time
- Lookup user-specific mapping in cache
- Fallback to unpersonalized results
- Store mapping for request
- Secondary fallback to default image for a
missing show
Example: Image Precompute Personalized Selection
(Precompute)
Unpersonalized
Default Image
Netflix runs 100% in AWS cloud
Needs to be reliable on unreliable infrastructure
Want a service to operate when an AWS instance
fails?
Randomly terminate instances (Chaos Monkey)
Want Netflix to operate when an entire AWS region
is having a problem?
Disable entire AWS regions (Chaos Kong)
Prevention: Failure Injection
What failure scenarios do you want your model to be
robust to?
Models are very sensitive to their input data
- Can be noisy, corrupt, delayed, incomplete, missing,
unknown
Train model to be resilient by injecting these conditions
into the training and testing data
Failure Injection for ML
Want to add a feature for type of row on homepage
Genre, Because you watched, Top picks, ...
Problem: Model may see new types online before in training data
Solution: Add unknown category and perturb a small fraction of
training data to that type
Rule-of-thumb: Apply for all categorical features unless new
category isn’t possible (Days of week -> OK, countries -> Not)
Example: Categorical features
Conclusions.
● Consider online learning and bandits
● Build off best practices from software engineering
● Automate as much as possible
● Adjust your data to cover the conditions you want
your model to be resilient to
● Detect problems and degrade gracefully
Take aways
Thank you.
Tony Jebara & Justin Basilico
Yes, we are hiring!
Ad

More Related Content

What's hot (20)

Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
Justin Basilico
 
Context Aware Recommendations at Netflix
Context Aware Recommendations at NetflixContext Aware Recommendations at Netflix
Context Aware Recommendations at Netflix
Linas Baltrunas
 
Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Tutorial on Deep Learning in Recommender System, Lars summer school 2019Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Anoop Deoras
 
Personalizing "The Netflix Experience" with Deep Learning
Personalizing "The Netflix Experience" with Deep LearningPersonalizing "The Netflix Experience" with Deep Learning
Personalizing "The Netflix Experience" with Deep Learning
Anoop Deoras
 
Missing values in recommender models
Missing values in recommender modelsMissing values in recommender models
Missing values in recommender models
Parmeshwar Khurd
 
Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it! Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it!
Sudeep Das, Ph.D.
 
Recent Trends in Personalization: A Netflix Perspective
Recent Trends in Personalization: A Netflix PerspectiveRecent Trends in Personalization: A Netflix Perspective
Recent Trends in Personalization: A Netflix Perspective
Justin Basilico
 
Personalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing RecommendationsPersonalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing Recommendations
Justin Basilico
 
Calibrated Recommendations
Calibrated RecommendationsCalibrated Recommendations
Calibrated Recommendations
Harald Steck
 
Artwork Personalization at Netflix
Artwork Personalization at NetflixArtwork Personalization at Netflix
Artwork Personalization at Netflix
Justin Basilico
 
Learning a Personalized Homepage
Learning a Personalized HomepageLearning a Personalized Homepage
Learning a Personalized Homepage
Justin Basilico
 
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
 Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se... Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
Sudeep Das, Ph.D.
 
Past, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry PerspectivePast, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry Perspective
Justin Basilico
 
Contextualization at Netflix
Contextualization at NetflixContextualization at Netflix
Contextualization at Netflix
Linas Baltrunas
 
Recommending for the World
Recommending for the WorldRecommending for the World
Recommending for the World
Yves Raimond
 
Data council SF 2020 Building a Personalized Messaging System at Netflix
Data council SF 2020 Building a Personalized Messaging System at NetflixData council SF 2020 Building a Personalized Messaging System at Netflix
Data council SF 2020 Building a Personalized Messaging System at Netflix
Grace T. Huang
 
Recommendation Modeling with Impression Data at Netflix
Recommendation Modeling with Impression Data at NetflixRecommendation Modeling with Impression Data at Netflix
Recommendation Modeling with Impression Data at Netflix
Jiangwei Pan
 
A Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at NetflixA Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at Netflix
Jaya Kawale
 
Artwork Personalization at Netflix Fernando Amat RecSys2018
Artwork Personalization at Netflix Fernando Amat RecSys2018 Artwork Personalization at Netflix Fernando Amat RecSys2018
Artwork Personalization at Netflix Fernando Amat RecSys2018
Fernando Amat
 
Shallow and Deep Latent Models for Recommender System
Shallow and Deep Latent Models for Recommender SystemShallow and Deep Latent Models for Recommender System
Shallow and Deep Latent Models for Recommender System
Anoop Deoras
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
Justin Basilico
 
Context Aware Recommendations at Netflix
Context Aware Recommendations at NetflixContext Aware Recommendations at Netflix
Context Aware Recommendations at Netflix
Linas Baltrunas
 
Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Tutorial on Deep Learning in Recommender System, Lars summer school 2019Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Anoop Deoras
 
Personalizing "The Netflix Experience" with Deep Learning
Personalizing "The Netflix Experience" with Deep LearningPersonalizing "The Netflix Experience" with Deep Learning
Personalizing "The Netflix Experience" with Deep Learning
Anoop Deoras
 
Missing values in recommender models
Missing values in recommender modelsMissing values in recommender models
Missing values in recommender models
Parmeshwar Khurd
 
Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it! Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it!
Sudeep Das, Ph.D.
 
Recent Trends in Personalization: A Netflix Perspective
Recent Trends in Personalization: A Netflix PerspectiveRecent Trends in Personalization: A Netflix Perspective
Recent Trends in Personalization: A Netflix Perspective
Justin Basilico
 
Personalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing RecommendationsPersonalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing Recommendations
Justin Basilico
 
Calibrated Recommendations
Calibrated RecommendationsCalibrated Recommendations
Calibrated Recommendations
Harald Steck
 
Artwork Personalization at Netflix
Artwork Personalization at NetflixArtwork Personalization at Netflix
Artwork Personalization at Netflix
Justin Basilico
 
Learning a Personalized Homepage
Learning a Personalized HomepageLearning a Personalized Homepage
Learning a Personalized Homepage
Justin Basilico
 
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
 Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se... Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
Sudeep Das, Ph.D.
 
Past, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry PerspectivePast, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry Perspective
Justin Basilico
 
Contextualization at Netflix
Contextualization at NetflixContextualization at Netflix
Contextualization at Netflix
Linas Baltrunas
 
Recommending for the World
Recommending for the WorldRecommending for the World
Recommending for the World
Yves Raimond
 
Data council SF 2020 Building a Personalized Messaging System at Netflix
Data council SF 2020 Building a Personalized Messaging System at NetflixData council SF 2020 Building a Personalized Messaging System at Netflix
Data council SF 2020 Building a Personalized Messaging System at Netflix
Grace T. Huang
 
Recommendation Modeling with Impression Data at Netflix
Recommendation Modeling with Impression Data at NetflixRecommendation Modeling with Impression Data at Netflix
Recommendation Modeling with Impression Data at Netflix
Jiangwei Pan
 
A Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at NetflixA Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at Netflix
Jaya Kawale
 
Artwork Personalization at Netflix Fernando Amat RecSys2018
Artwork Personalization at Netflix Fernando Amat RecSys2018 Artwork Personalization at Netflix Fernando Amat RecSys2018
Artwork Personalization at Netflix Fernando Amat RecSys2018
Fernando Amat
 
Shallow and Deep Latent Models for Recommender System
Shallow and Deep Latent Models for Recommender SystemShallow and Deep Latent Models for Recommender System
Shallow and Deep Latent Models for Recommender System
Anoop Deoras
 

Similar to Making Netflix Machine Learning Algorithms Reliable (20)

Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning ModelsApache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Anyscale
 
Apache Spark Model Deployment
Apache Spark Model Deployment Apache Spark Model Deployment
Apache Spark Model Deployment
Databricks
 
Machine Learning for Everyone
Machine Learning for EveryoneMachine Learning for Everyone
Machine Learning for Everyone
Aly Abdelkareem
 
Azure Machine Learning Dotnet Campus 2015
Azure Machine Learning Dotnet Campus 2015 Azure Machine Learning Dotnet Campus 2015
Azure Machine Learning Dotnet Campus 2015
antimo musone
 
PREDICT THE FUTURE , MACHINE LEARNING & BIG DATA
PREDICT THE FUTURE , MACHINE LEARNING & BIG DATAPREDICT THE FUTURE , MACHINE LEARNING & BIG DATA
PREDICT THE FUTURE , MACHINE LEARNING & BIG DATA
DotNetCampus
 
Net campus2015 antimomusone
Net campus2015 antimomusoneNet campus2015 antimomusone
Net campus2015 antimomusone
DotNetCampus
 
Building High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsBuilding High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning Applications
Yalçın Yenigün
 
Machine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout SessionMachine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout Session
Splunk
 
From Labelling Open data images to building a private recommender system
From Labelling Open data images to building a private recommender systemFrom Labelling Open data images to building a private recommender system
From Labelling Open data images to building a private recommender system
Pierre Gutierrez
 
Machine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout SessionMachine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout Session
Splunk
 
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it Work
Ivo Andreev
 
B4UConference_machine learning_deeplearning
B4UConference_machine learning_deeplearningB4UConference_machine learning_deeplearning
B4UConference_machine learning_deeplearning
Hoa Le
 
Splunk for Machine Learning and Analytics
Splunk for Machine Learning and AnalyticsSplunk for Machine Learning and Analytics
Splunk for Machine Learning and Analytics
Splunk
 
Splunk for Machine Learning and Analytics
Splunk for Machine Learning and AnalyticsSplunk for Machine Learning and Analytics
Splunk for Machine Learning and Analytics
Shannon Cuthbertson
 
Building Continuous Learning Systems
Building Continuous Learning SystemsBuilding Continuous Learning Systems
Building Continuous Learning Systems
Anuj Gupta
 
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
Databricks
 
Splunk for Machine Learning and Analytics
Splunk for Machine Learning and AnalyticsSplunk for Machine Learning and Analytics
Splunk for Machine Learning and Analytics
Splunk
 
Cutting Edge Computer Vision for Everyone
Cutting Edge Computer Vision for EveryoneCutting Edge Computer Vision for Everyone
Cutting Edge Computer Vision for Everyone
Ivo Andreev
 
Continuous Intelligence Workshop
Continuous Intelligence WorkshopContinuous Intelligence Workshop
Continuous Intelligence Workshop
David Tan
 
“Houston, we have a model...” Introduction to MLOps
“Houston, we have a model...” Introduction to MLOps“Houston, we have a model...” Introduction to MLOps
“Houston, we have a model...” Introduction to MLOps
Rui Quintino
 
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning ModelsApache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Anyscale
 
Apache Spark Model Deployment
Apache Spark Model Deployment Apache Spark Model Deployment
Apache Spark Model Deployment
Databricks
 
Machine Learning for Everyone
Machine Learning for EveryoneMachine Learning for Everyone
Machine Learning for Everyone
Aly Abdelkareem
 
Azure Machine Learning Dotnet Campus 2015
Azure Machine Learning Dotnet Campus 2015 Azure Machine Learning Dotnet Campus 2015
Azure Machine Learning Dotnet Campus 2015
antimo musone
 
PREDICT THE FUTURE , MACHINE LEARNING & BIG DATA
PREDICT THE FUTURE , MACHINE LEARNING & BIG DATAPREDICT THE FUTURE , MACHINE LEARNING & BIG DATA
PREDICT THE FUTURE , MACHINE LEARNING & BIG DATA
DotNetCampus
 
Net campus2015 antimomusone
Net campus2015 antimomusoneNet campus2015 antimomusone
Net campus2015 antimomusone
DotNetCampus
 
Building High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsBuilding High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning Applications
Yalçın Yenigün
 
Machine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout SessionMachine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout Session
Splunk
 
From Labelling Open data images to building a private recommender system
From Labelling Open data images to building a private recommender systemFrom Labelling Open data images to building a private recommender system
From Labelling Open data images to building a private recommender system
Pierre Gutierrez
 
Machine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout SessionMachine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout Session
Splunk
 
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it Work
Ivo Andreev
 
B4UConference_machine learning_deeplearning
B4UConference_machine learning_deeplearningB4UConference_machine learning_deeplearning
B4UConference_machine learning_deeplearning
Hoa Le
 
Splunk for Machine Learning and Analytics
Splunk for Machine Learning and AnalyticsSplunk for Machine Learning and Analytics
Splunk for Machine Learning and Analytics
Splunk
 
Splunk for Machine Learning and Analytics
Splunk for Machine Learning and AnalyticsSplunk for Machine Learning and Analytics
Splunk for Machine Learning and Analytics
Shannon Cuthbertson
 
Building Continuous Learning Systems
Building Continuous Learning SystemsBuilding Continuous Learning Systems
Building Continuous Learning Systems
Anuj Gupta
 
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
Databricks
 
Splunk for Machine Learning and Analytics
Splunk for Machine Learning and AnalyticsSplunk for Machine Learning and Analytics
Splunk for Machine Learning and Analytics
Splunk
 
Cutting Edge Computer Vision for Everyone
Cutting Edge Computer Vision for EveryoneCutting Edge Computer Vision for Everyone
Cutting Edge Computer Vision for Everyone
Ivo Andreev
 
Continuous Intelligence Workshop
Continuous Intelligence WorkshopContinuous Intelligence Workshop
Continuous Intelligence Workshop
David Tan
 
“Houston, we have a model...” Introduction to MLOps
“Houston, we have a model...” Introduction to MLOps“Houston, we have a model...” Introduction to MLOps
“Houston, we have a model...” Introduction to MLOps
Rui Quintino
 
Ad

More from Justin Basilico (6)

Recent Trends in Personalization at Netflix
Recent Trends in Personalization at NetflixRecent Trends in Personalization at Netflix
Recent Trends in Personalization at Netflix
Justin Basilico
 
Recap: Designing a more Efficient Estimator for Off-policy Evaluation in Band...
Recap: Designing a more Efficient Estimator for Off-policy Evaluation in Band...Recap: Designing a more Efficient Estimator for Off-policy Evaluation in Band...
Recap: Designing a more Efficient Estimator for Off-policy Evaluation in Band...
Justin Basilico
 
Is that a Time Machine? Some Design Patterns for Real World Machine Learning ...
Is that a Time Machine? Some Design Patterns for Real World Machine Learning ...Is that a Time Machine? Some Design Patterns for Real World Machine Learning ...
Is that a Time Machine? Some Design Patterns for Real World Machine Learning ...
Justin Basilico
 
Recommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareRecommendations for Building Machine Learning Software
Recommendations for Building Machine Learning Software
Justin Basilico
 
Recommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareRecommendations for Building Machine Learning Software
Recommendations for Building Machine Learning Software
Justin Basilico
 
Learning to Personalize
Learning to PersonalizeLearning to Personalize
Learning to Personalize
Justin Basilico
 
Recent Trends in Personalization at Netflix
Recent Trends in Personalization at NetflixRecent Trends in Personalization at Netflix
Recent Trends in Personalization at Netflix
Justin Basilico
 
Recap: Designing a more Efficient Estimator for Off-policy Evaluation in Band...
Recap: Designing a more Efficient Estimator for Off-policy Evaluation in Band...Recap: Designing a more Efficient Estimator for Off-policy Evaluation in Band...
Recap: Designing a more Efficient Estimator for Off-policy Evaluation in Band...
Justin Basilico
 
Is that a Time Machine? Some Design Patterns for Real World Machine Learning ...
Is that a Time Machine? Some Design Patterns for Real World Machine Learning ...Is that a Time Machine? Some Design Patterns for Real World Machine Learning ...
Is that a Time Machine? Some Design Patterns for Real World Machine Learning ...
Justin Basilico
 
Recommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareRecommendations for Building Machine Learning Software
Recommendations for Building Machine Learning Software
Justin Basilico
 
Recommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareRecommendations for Building Machine Learning Software
Recommendations for Building Machine Learning Software
Justin Basilico
 
Ad

Recently uploaded (20)

Design pattern talk by Kaya Weers - 2025 (v2)
Design pattern talk by Kaya Weers - 2025 (v2)Design pattern talk by Kaya Weers - 2025 (v2)
Design pattern talk by Kaya Weers - 2025 (v2)
Kaya Weers
 
AsyncAPI v3 : Streamlining Event-Driven API Design
AsyncAPI v3 : Streamlining Event-Driven API DesignAsyncAPI v3 : Streamlining Event-Driven API Design
AsyncAPI v3 : Streamlining Event-Driven API Design
leonid54
 
Reimagine How You and Your Team Work with Microsoft 365 Copilot.pptx
Reimagine How You and Your Team Work with Microsoft 365 Copilot.pptxReimagine How You and Your Team Work with Microsoft 365 Copilot.pptx
Reimagine How You and Your Team Work with Microsoft 365 Copilot.pptx
John Moore
 
Mastering Testing in the Modern F&B Landscape
Mastering Testing in the Modern F&B LandscapeMastering Testing in the Modern F&B Landscape
Mastering Testing in the Modern F&B Landscape
marketing943205
 
Transcript: Canadian book publishing: Insights from the latest salary survey ...
Transcript: Canadian book publishing: Insights from the latest salary survey ...Transcript: Canadian book publishing: Insights from the latest salary survey ...
Transcript: Canadian book publishing: Insights from the latest salary survey ...
BookNet Canada
 
Kit-Works Team Study_팀스터디_김한솔_nuqs_20250509.pdf
Kit-Works Team Study_팀스터디_김한솔_nuqs_20250509.pdfKit-Works Team Study_팀스터디_김한솔_nuqs_20250509.pdf
Kit-Works Team Study_팀스터디_김한솔_nuqs_20250509.pdf
Wonjun Hwang
 
Unlocking Generative AI in your Web Apps
Unlocking Generative AI in your Web AppsUnlocking Generative AI in your Web Apps
Unlocking Generative AI in your Web Apps
Maximiliano Firtman
 
AI You Can Trust: The Critical Role of Governance and Quality.pdf
AI You Can Trust: The Critical Role of Governance and Quality.pdfAI You Can Trust: The Critical Role of Governance and Quality.pdf
AI You Can Trust: The Critical Role of Governance and Quality.pdf
Precisely
 
Smart Investments Leveraging Agentic AI for Real Estate Success.pptx
Smart Investments Leveraging Agentic AI for Real Estate Success.pptxSmart Investments Leveraging Agentic AI for Real Estate Success.pptx
Smart Investments Leveraging Agentic AI for Real Estate Success.pptx
Seasia Infotech
 
Config 2025 presentation recap covering both days
Config 2025 presentation recap covering both daysConfig 2025 presentation recap covering both days
Config 2025 presentation recap covering both days
TrishAntoni1
 
Cybersecurity Threat Vectors and Mitigation
Cybersecurity Threat Vectors and MitigationCybersecurity Threat Vectors and Mitigation
Cybersecurity Threat Vectors and Mitigation
VICTOR MAESTRE RAMIREZ
 
GDG Cloud Southlake #42: Suresh Mathew: Autonomous Resource Optimization: How...
GDG Cloud Southlake #42: Suresh Mathew: Autonomous Resource Optimization: How...GDG Cloud Southlake #42: Suresh Mathew: Autonomous Resource Optimization: How...
GDG Cloud Southlake #42: Suresh Mathew: Autonomous Resource Optimization: How...
James Anderson
 
Webinar - Top 5 Backup Mistakes MSPs and Businesses Make .pptx
Webinar - Top 5 Backup Mistakes MSPs and Businesses Make   .pptxWebinar - Top 5 Backup Mistakes MSPs and Businesses Make   .pptx
Webinar - Top 5 Backup Mistakes MSPs and Businesses Make .pptx
MSP360
 
Enterprise Integration Is Dead! Long Live AI-Driven Integration with Apache C...
Enterprise Integration Is Dead! Long Live AI-Driven Integration with Apache C...Enterprise Integration Is Dead! Long Live AI-Driven Integration with Apache C...
Enterprise Integration Is Dead! Long Live AI-Driven Integration with Apache C...
Markus Eisele
 
UiPath Automation Suite – Cas d'usage d'une NGO internationale basée à Genève
UiPath Automation Suite – Cas d'usage d'une NGO internationale basée à GenèveUiPath Automation Suite – Cas d'usage d'une NGO internationale basée à Genève
UiPath Automation Suite – Cas d'usage d'une NGO internationale basée à Genève
UiPathCommunity
 
RTP Over QUIC: An Interesting Opportunity Or Wasted Time?
RTP Over QUIC: An Interesting Opportunity Or Wasted Time?RTP Over QUIC: An Interesting Opportunity Or Wasted Time?
RTP Over QUIC: An Interesting Opportunity Or Wasted Time?
Lorenzo Miniero
 
Bepents tech services - a premier cybersecurity consulting firm
Bepents tech services - a premier cybersecurity consulting firmBepents tech services - a premier cybersecurity consulting firm
Bepents tech services - a premier cybersecurity consulting firm
Benard76
 
UiPath Agentic Automation: Community Developer Opportunities
UiPath Agentic Automation: Community Developer OpportunitiesUiPath Agentic Automation: Community Developer Opportunities
UiPath Agentic Automation: Community Developer Opportunities
DianaGray10
 
How to Install & Activate ListGrabber - eGrabber
How to Install & Activate ListGrabber - eGrabberHow to Install & Activate ListGrabber - eGrabber
How to Install & Activate ListGrabber - eGrabber
eGrabber
 
On-Device or Remote? On the Energy Efficiency of Fetching LLM-Generated Conte...
On-Device or Remote? On the Energy Efficiency of Fetching LLM-Generated Conte...On-Device or Remote? On the Energy Efficiency of Fetching LLM-Generated Conte...
On-Device or Remote? On the Energy Efficiency of Fetching LLM-Generated Conte...
Ivano Malavolta
 
Design pattern talk by Kaya Weers - 2025 (v2)
Design pattern talk by Kaya Weers - 2025 (v2)Design pattern talk by Kaya Weers - 2025 (v2)
Design pattern talk by Kaya Weers - 2025 (v2)
Kaya Weers
 
AsyncAPI v3 : Streamlining Event-Driven API Design
AsyncAPI v3 : Streamlining Event-Driven API DesignAsyncAPI v3 : Streamlining Event-Driven API Design
AsyncAPI v3 : Streamlining Event-Driven API Design
leonid54
 
Reimagine How You and Your Team Work with Microsoft 365 Copilot.pptx
Reimagine How You and Your Team Work with Microsoft 365 Copilot.pptxReimagine How You and Your Team Work with Microsoft 365 Copilot.pptx
Reimagine How You and Your Team Work with Microsoft 365 Copilot.pptx
John Moore
 
Mastering Testing in the Modern F&B Landscape
Mastering Testing in the Modern F&B LandscapeMastering Testing in the Modern F&B Landscape
Mastering Testing in the Modern F&B Landscape
marketing943205
 
Transcript: Canadian book publishing: Insights from the latest salary survey ...
Transcript: Canadian book publishing: Insights from the latest salary survey ...Transcript: Canadian book publishing: Insights from the latest salary survey ...
Transcript: Canadian book publishing: Insights from the latest salary survey ...
BookNet Canada
 
Kit-Works Team Study_팀스터디_김한솔_nuqs_20250509.pdf
Kit-Works Team Study_팀스터디_김한솔_nuqs_20250509.pdfKit-Works Team Study_팀스터디_김한솔_nuqs_20250509.pdf
Kit-Works Team Study_팀스터디_김한솔_nuqs_20250509.pdf
Wonjun Hwang
 
Unlocking Generative AI in your Web Apps
Unlocking Generative AI in your Web AppsUnlocking Generative AI in your Web Apps
Unlocking Generative AI in your Web Apps
Maximiliano Firtman
 
AI You Can Trust: The Critical Role of Governance and Quality.pdf
AI You Can Trust: The Critical Role of Governance and Quality.pdfAI You Can Trust: The Critical Role of Governance and Quality.pdf
AI You Can Trust: The Critical Role of Governance and Quality.pdf
Precisely
 
Smart Investments Leveraging Agentic AI for Real Estate Success.pptx
Smart Investments Leveraging Agentic AI for Real Estate Success.pptxSmart Investments Leveraging Agentic AI for Real Estate Success.pptx
Smart Investments Leveraging Agentic AI for Real Estate Success.pptx
Seasia Infotech
 
Config 2025 presentation recap covering both days
Config 2025 presentation recap covering both daysConfig 2025 presentation recap covering both days
Config 2025 presentation recap covering both days
TrishAntoni1
 
Cybersecurity Threat Vectors and Mitigation
Cybersecurity Threat Vectors and MitigationCybersecurity Threat Vectors and Mitigation
Cybersecurity Threat Vectors and Mitigation
VICTOR MAESTRE RAMIREZ
 
GDG Cloud Southlake #42: Suresh Mathew: Autonomous Resource Optimization: How...
GDG Cloud Southlake #42: Suresh Mathew: Autonomous Resource Optimization: How...GDG Cloud Southlake #42: Suresh Mathew: Autonomous Resource Optimization: How...
GDG Cloud Southlake #42: Suresh Mathew: Autonomous Resource Optimization: How...
James Anderson
 
Webinar - Top 5 Backup Mistakes MSPs and Businesses Make .pptx
Webinar - Top 5 Backup Mistakes MSPs and Businesses Make   .pptxWebinar - Top 5 Backup Mistakes MSPs and Businesses Make   .pptx
Webinar - Top 5 Backup Mistakes MSPs and Businesses Make .pptx
MSP360
 
Enterprise Integration Is Dead! Long Live AI-Driven Integration with Apache C...
Enterprise Integration Is Dead! Long Live AI-Driven Integration with Apache C...Enterprise Integration Is Dead! Long Live AI-Driven Integration with Apache C...
Enterprise Integration Is Dead! Long Live AI-Driven Integration with Apache C...
Markus Eisele
 
UiPath Automation Suite – Cas d'usage d'une NGO internationale basée à Genève
UiPath Automation Suite – Cas d'usage d'une NGO internationale basée à GenèveUiPath Automation Suite – Cas d'usage d'une NGO internationale basée à Genève
UiPath Automation Suite – Cas d'usage d'une NGO internationale basée à Genève
UiPathCommunity
 
RTP Over QUIC: An Interesting Opportunity Or Wasted Time?
RTP Over QUIC: An Interesting Opportunity Or Wasted Time?RTP Over QUIC: An Interesting Opportunity Or Wasted Time?
RTP Over QUIC: An Interesting Opportunity Or Wasted Time?
Lorenzo Miniero
 
Bepents tech services - a premier cybersecurity consulting firm
Bepents tech services - a premier cybersecurity consulting firmBepents tech services - a premier cybersecurity consulting firm
Bepents tech services - a premier cybersecurity consulting firm
Benard76
 
UiPath Agentic Automation: Community Developer Opportunities
UiPath Agentic Automation: Community Developer OpportunitiesUiPath Agentic Automation: Community Developer Opportunities
UiPath Agentic Automation: Community Developer Opportunities
DianaGray10
 
How to Install & Activate ListGrabber - eGrabber
How to Install & Activate ListGrabber - eGrabberHow to Install & Activate ListGrabber - eGrabber
How to Install & Activate ListGrabber - eGrabber
eGrabber
 
On-Device or Remote? On the Energy Efficiency of Fetching LLM-Generated Conte...
On-Device or Remote? On the Energy Efficiency of Fetching LLM-Generated Conte...On-Device or Remote? On the Energy Efficiency of Fetching LLM-Generated Conte...
On-Device or Remote? On the Energy Efficiency of Fetching LLM-Generated Conte...
Ivano Malavolta
 

Making Netflix Machine Learning Algorithms Reliable

  • 1. Making Netflix Machine Learning Algorithms Reliable Tony Jebara & Justin Basilico ICML Reliable ML in the Wild Workshop 2017-08-11
  • 3. 3 2017 > 100M members > 190 countries
  • 4. Goal: Help members find content to watch and enjoy to maximize member satisfaction and retention
  • 5. Algorithm Areas ▪ Personalized Ranking ▪ Top-N Ranking ▪ Trending Now ▪ Continue Watching ▪ Video-Video Similarity ▪ Personalized Page Generation ▪ Search ▪ Personalized Image Selection ▪ ...
  • 6. Models & Algorithms ▪ Regression (linear, logistic, ...) ▪ Matrix Factorization ▪ Factorization Machines ▪ Clustering & Topic Models ▪ Bayesian Nonparametrics ▪ Tree Ensembles (RF, GBDTs, …) ▪ Neural Networks (Deep, RBMs, …) ▪ Gaussian Processes ▪ Bandits ▪ …
  • 7. A/B tests validate an overall approach works in expectation But they run in online production, so every A/B tested model needs to be reliable Innovation Cycle Idea Offline Experiment Full Online Deployment Online A/B Test
  • 8. 1) Collect massive data sets 2) Try billions of hypotheses to find* one(s) with support *Find with computational and statistical efficiency Batch Learning
  • 9. USERS TIME Collect Learn A/B Test Roll-out Data Model A B Batch Learning
  • 11. Explore and exploit → Less regret Helps cold start models in the wild Maintain some exploration for nonstationarity Adapt reliably to changing and new data Epsilon-greedy, UCB, Thompson Sampling, etc. USERS TIME x x x x x x x x x x x x x x x x x Bandit Learning
  • 12. USERS TIME Bandit Learning 1) Uniform population of hypotheses 2) Choose a random hypothesis h 3) Act according to h and observe outcome 4) Re-weight hypotheses 5) Go to 2 TS
  • 15. Different preferences for genre/theme portrayed Contextual bandits to personalize images
  • 16. Different preferences for cast members Contextual bandits to personalize images
  • 18. 18 Training Pipeline Application “Typical” ML Pipeline: A tale of two worlds Historical Data Generate Features Train Models Validate & Select Models Application Logic Live Data Load Model Offline Online Collect Labels Publish Model Evaluate Model Experimentation
  • 19. Offline: Model trains reliably Online: Model performs reliably … plus all involved systems and data What needs to be reliable?
  • 22. To be reliable, first learning must be repeatable Automate retraining of models Akin to continuous deployment in software engineering How often depends on application; typically daily Detect problems and fail fast to avoid using a bad model Retraining
  • 23. Example Training Pipeline ● Periodic retraining to refresh models ● Workflow system to manage pipeline ● Each step is a Spark or Docker job ● Each step has checks in place ○ Response: Stop workflow and send alerts
  • 24. What to check? ● For both absolute and relative to previous runs... ● Offline metrics on hold-out set ● Data size for train and test ● Feature distributions ● Feature importance ● Large (unexpected) changes in output between models ● Model coverage (e.g. number of items it can predict) ● Model integrity ● Error counters ● ...
  • 25. Example 1: number of samples Numberoftrainingsamples Time
  • 26. Example 1: number of samples Numberoftrainingsamples Time Alarm fires and model is not published due to anomaly
  • 27. Example 2: offline metrics Liftw.r.tbaseline Time
  • 28. Example 2: offline metrics Liftw.r.tbaseline Time Alarm fires and model is not published due to anomaly Absolute threshold: if lift < X then alarm also fires
  • 29. Testing - Unit testing and code coverage - Integration testing Improve quality and reliability of upstream data feeds Reuse same data and code offline and online to avoid training/serving skew Running from multiple random seeds Preventing offline training failures
  • 31. Check for invariants in inputs and outputs Can catch many unexpected changes, bad assumptions and engineering issues Examples: - Feature values are out-of-range - Output is NaN or Infinity - Probabilities are < 0 or > 1 or don’t sum to 1 Basic sanity checks
  • 32. Online staleness checks can cover a wide range of failures in the training and publishing Example checks: - How long ago was the most recent model published? - How old is the data it was trained on? - How old are the inputs it is consuming? Model and input staleness Stale Model Ageofmodel
  • 33. Track the quality of the model - Compare prediction to actual behavior - Online equivalents of offline metrics For online learning or bandits reserve a fraction of data for a simple policy (e.g. epsilon-greedy) as a sanity check Online metrics
  • 34. Your model isn’t working right or your input data is bad: Now what? Common approaches in personalization space: - Use previous model? - Use previous output? - Use simplified model or heuristic? - If calling system is resilient: turn off that subsystem? Response: Graceful Degradation
  • 35. Want to choose personalized images per profile - Image lookup has O(10M) requests per second (e.g. “House of Cards” -> Image URL) Approach: - Precompute show to image mapping per user near-line system - Store mapping in fast distributed cache - At request time - Lookup user-specific mapping in cache - Fallback to unpersonalized results - Store mapping for request - Secondary fallback to default image for a missing show Example: Image Precompute Personalized Selection (Precompute) Unpersonalized Default Image
  • 36. Netflix runs 100% in AWS cloud Needs to be reliable on unreliable infrastructure Want a service to operate when an AWS instance fails? Randomly terminate instances (Chaos Monkey) Want Netflix to operate when an entire AWS region is having a problem? Disable entire AWS regions (Chaos Kong) Prevention: Failure Injection
  • 37. What failure scenarios do you want your model to be robust to? Models are very sensitive to their input data - Can be noisy, corrupt, delayed, incomplete, missing, unknown Train model to be resilient by injecting these conditions into the training and testing data Failure Injection for ML
  • 38. Want to add a feature for type of row on homepage Genre, Because you watched, Top picks, ... Problem: Model may see new types online before in training data Solution: Add unknown category and perturb a small fraction of training data to that type Rule-of-thumb: Apply for all categorical features unless new category isn’t possible (Days of week -> OK, countries -> Not) Example: Categorical features
  • 40. ● Consider online learning and bandits ● Build off best practices from software engineering ● Automate as much as possible ● Adjust your data to cover the conditions you want your model to be resilient to ● Detect problems and degrade gracefully Take aways
  • 41. Thank you. Tony Jebara & Justin Basilico Yes, we are hiring!
  翻译: