SlideShare a Scribd company logo
Parmeshwar Khurd, Ehsan Saberian & Maryam Esmaeili
ML Platform Meetup, 6/20/2019
Missing Values in
Recommender Models
Talk Outline
● Problem Statement: Missing Features in Recommender Systems (RS)
● Handling Missing Features in GBDTs
● Handling Missing Features in NNs
● Conclusion
01
Problem Statement
● Scientists and engineers in the mathematical sciences have historically
dealt with the problem of missing observations for a long time
● Typical patterns in physics:
a. Astronomers fill in missing observations for orbits via least-squares:
Ceres example
b. Models to explain all observations including missing / future ones
i. Physicist proposes a new model explaining past observations that previous models
cannot adequately explain
ii. She realizes new model predicts events for which past observations do not exist
iii. New observations are collected to validate new model
Missing Observations vs. Missing Features
Physics Example from 100 Years Ago
● Einstein proposed general relativity model for
gravitation in 1915, an improvement over Newtonian
models, with two striking examples:
○ It better explained known observed shifts in
perihelion (closest point to Sun) of Mercury’s orbit
○ It predicted as yet unmeasured bending of light
from distant stars, e.g., during solar eclipse,
bending ~ 1.75 arc-seconds, twice Newtonian
prediction. Arms race to validate experimentally:
Eddington succeeded in May 1919
Non-parametric/ big-data Correlational Models
● We have already talked about several complex models:
○ Correlational: Assume time-dependent elliptical functional form for planetary orbit, fit/regress parameters
assuming normal noise to fill in missing past coordinates and predict future motion
○ Causal: Newton or Einstein’s general causal models for gravitation PDEs for planetary motion
functional forms of orbits / perihelion shifts and suggested new observations no one had thought to measure
● In rest of talk, we focus on correlational models, but they are statistical and more complex:
○ trained on more data (both more features and samples)
○ non-parametric (decision trees) or many parameters (neural networks)
● But observation not missing, only a part of it:
○ incomplete observation is called observation with missing data
○ if input is incomplete, it is an observation with missing features
Improving correlational ML Models in RS
● Given context, predictive ML model in recommender system (RS) needs
to match users with items they might enjoy
● Thankfully, as ML engineers in the recommendation space, we need less
creativity and labor than Einstein / Eddington to improve models
● In supervised ML models, we can time-travel our (features, labels) to see
if our newer predictive models improve performance on historical offline
metrics [Netflix Delorean ML blog]
● Model improvements come from leveraging
○ business information (more appropriate metrics or inputs)
○ ML models: BERT, CatBoost, Factorization Machines, etc.
Problem of Missing Data in RS - I
● ML models in RS need to deal with missing data patterns for cases such as:
○ New users
○ New contexts (e.g., country, time-zone, language, device, row-type)
○ New items
○ Timeouts and failures in data microservices
○ Modeling causal impact of recommendations
○ Intent-to-treat
● Unfortunately, last two problems similar to Einstein/Eddington example:
Solutions involve causal models / contextual bandits and discussed elsewhere [Netflix talk]
● Not handling missing labels: Optimizing RS for longer-term reward (label) a harder problem
[Netflix talk]
Problem of Missing Data in RS - II
Guiding principles in this talk for RS cold-start + other correlational missing
feature problems
● Let ML models handle missing values rather than imputing and/or adding
features (via models or simple statistics)
○ Both GBDTs and NNs allow this
● ML models generally better at interpolation than extrapolation
○ Many past examples of service handling new users, items and contexts
○ For robust extrapolation during timeouts or data service failures, add simulated
examples in training and/or impose feature monotonicity constraints
New Users - I
● New users join Netflix every minute
New Users - II
● We get some taste information in
the sign-up flow
● But clearly, we don’t know enough
(what have they watched
elsewhere, broader tastes, etc.) to
personalize well
● Rather than try to extrapolate into
the past, personalize progressively
better as they interact with our
service
New Contexts
● ML models in search / recommender systems need to respect user
language choice
● As new languages are supported, these choices will grow
New Items - I
● New items are added to the Netflix service every day
SNL
New items - II
● New items miss any features
based on engagement data
● “Coming Soon” tab shows
trailers
○ This tab needs a
personalized ranker as well
02
Handling Missing Features in
GBDTs
GBDT for RS
● Several packages to train GBDTs: XGBoost, R’s GBM, CatBoost,
LightGBM, Cognitive Foundry, sklearn, etc.
● XGBoost won several structured data Kaggle competitions
● Netflix talk on fast scoring of XGBoost models
● Dwell-time for Yahoo homepage recommender (RecSys 2014 Best Paper)
Source: XGBoost
(S)GBDT Background - I
Training Stochastic Gradient Boosted Decision Trees (SGBDTs) for (logistic) loss
minimization consists of one main algorithm (greedily learn ensemble) and two
sub-algorithms (learn individual tree, learn split at each node of tree) :
Learn leaf coefficient
by one iteration of
Newton-Raphson
Get gradient of (logistic)
loss per example w.r.t.
current ensemble
Learn tree structure
(S)GBDT Background - II
Learn left and right
trees recursively
Find best split via
variance reduction
Missing Value Handling w GBDTs: Taxonomy
● ESL-II, (Section 9.6) mentions 3 ways to handle missing values:
○ Discard observations with any missing values
○ Impute all missing values before training via models or simple statistics :
Item popularities may be initialized randomly or to zero or via weighted averaging, where
weights may indicate similarity determined via meta-data
○ Rely on the learning algorithm to deal with missing values in its training
phase via surrogate splits non-strict usage
in tree:
■ Categoricals can include one more “missing” category
■ Continuous / categorical:
● Send example left or right for missing value appropriately (XGBoost)
● Use ternary split with missing branch (R’s GBM)
Missing Value Handling w R’s GBM
● Use ternary split with
missing branch:
○ Weighted
variance
reduction in
Best-Split
algorithm
updated to
include missing
variance
Missing Value Handling w XGBoost
Always send example left or right for
missing value appropriately:
● Evaluate best threshold and
variance reduction in Best-Split
algorithm from sending missing
values left or right (post-hoc)
and then pick better choice
03
Handling Missing Features in
NNs
Recurrent Neural Network (NN) for RS
● Youtube latent cross
recurrent NN, WSDM 2018
● Trained with
TensorFlow/Keras
○ Other options include
PyTorch, MxNet,
CNTK, etc.
Missing Value Handling w NNs: Taxonomy
● Similar taxonomy as in the case of GBDTs
○ Discard observations with any missing values
■ Dropout: Drop connections w missing values, scale up others
○ Impute all missing values before training via models or simple
statistics: Item embeddings may be initialized randomly or to zeros or via weighted
averaging, where weights may indicate similarity determined via meta-data
○ Rely on the learning algorithm to deal with missing values in its
training phase via hidden layers
■ Categoricals: Single “missing” item hidden embedding or DropoutNet (NIPS17)
■ Continuous / Categorical: Impute continuous + include “missing” embedding or
Hidden layer reaches (NIPS18) “average” for missing feature or item
Missing Value Handling w DropoutNet
Auto-encoder with item/user-vec randomly retained or set to zero/average
Missing Value Handling w Hidden “Average”
Partly closed-form “average” for missing GMM first hidden layer activation
● A variety of ways to handle missing values in recommender models
● Only presented subset of approaches that do not modify / impute
inputs and treat missing values within training algorithm
● Optimal approach for a problem likely dataset-dependent !
Conclusion
● How Gauss determined the orbit of Ceres, J. Tennenbaum, et al.
● Why beauty is truth: a history of symmetry, I. Stewart
● MAY 29, 1919: A MAJOR ECLIPSE, RELATIVELY SPEAKING, L. Buchen, Wired
● Delorean, H. Taghavi, et al., Netflix
● Bandits for Recommendations, J. Kawale, et al., Netflix
● Longer-term outcomes, B. Rostykus et al., Netflix
● Speeding up XGBoost Scoring, D. Parekh, et al., Netflix
● Beyond clicks: Dwell-time for personalization, X Yi, et al.
● Latent Cross: Making Use of Context in Recurrent Recommender Systems, Beutel, et al.
● ESL-II: Elements of Statistical Learning, Hastie, Tibshirani, Friedman
● R GBM
● Xgboost
● Processing of missing data via neural networks, Smieja, et al.
● DropoutNet: Addressing Cold Start in Recommender Systems, Volkovs et al.
● Inference and missing data. Biometrika, 63, 581–592, Rubin, et al.
References
Acknowledgments
The presenters wish to thank J. Basilico, H. Taghavi, Y. Raimond, S. Das, J. Kim, A. Deoras, C. Alvino and several
others for discussions and contributions
Thank You !
Ad

More Related Content

What's hot (20)

Calibrated Recommendations
Calibrated RecommendationsCalibrated Recommendations
Calibrated Recommendations
Harald Steck
 
Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it! Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it!
Sudeep Das, Ph.D.
 
Learning a Personalized Homepage
Learning a Personalized HomepageLearning a Personalized Homepage
Learning a Personalized Homepage
Justin Basilico
 
Making Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms ReliableMaking Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms Reliable
Justin Basilico
 
Context Aware Recommendations at Netflix
Context Aware Recommendations at NetflixContext Aware Recommendations at Netflix
Context Aware Recommendations at Netflix
Linas Baltrunas
 
Contextualization at Netflix
Contextualization at NetflixContextualization at Netflix
Contextualization at Netflix
Linas Baltrunas
 
Personalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing RecommendationsPersonalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing Recommendations
Justin Basilico
 
Data council SF 2020 Building a Personalized Messaging System at Netflix
Data council SF 2020 Building a Personalized Messaging System at NetflixData council SF 2020 Building a Personalized Messaging System at Netflix
Data council SF 2020 Building a Personalized Messaging System at Netflix
Grace T. Huang
 
Past, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry PerspectivePast, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry Perspective
Justin Basilico
 
Recent Trends in Personalization: A Netflix Perspective
Recent Trends in Personalization: A Netflix PerspectiveRecent Trends in Personalization: A Netflix Perspective
Recent Trends in Personalization: A Netflix Perspective
Justin Basilico
 
Artwork Personalization at Netflix Fernando Amat RecSys2018
Artwork Personalization at Netflix Fernando Amat RecSys2018 Artwork Personalization at Netflix Fernando Amat RecSys2018
Artwork Personalization at Netflix Fernando Amat RecSys2018
Fernando Amat
 
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
 Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se... Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
Sudeep Das, Ph.D.
 
Netflix talk at ML Platform meetup Sep 2019
Netflix talk at ML Platform meetup Sep 2019Netflix talk at ML Platform meetup Sep 2019
Netflix talk at ML Platform meetup Sep 2019
Faisal Siddiqi
 
Lessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at NetflixLessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at Netflix
Justin Basilico
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
MLconf
 
Recommendation at Netflix Scale
Recommendation at Netflix ScaleRecommendation at Netflix Scale
Recommendation at Netflix Scale
Justin Basilico
 
A Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at NetflixA Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at Netflix
Jaya Kawale
 
Recommending for the World
Recommending for the WorldRecommending for the World
Recommending for the World
Yves Raimond
 
Reward Innovation for long-term member satisfaction
Reward Innovation for long-term member satisfactionReward Innovation for long-term member satisfaction
Reward Innovation for long-term member satisfaction
Jiangwei Pan
 
Past, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspectivePast, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspective
Xavier Amatriain
 
Calibrated Recommendations
Calibrated RecommendationsCalibrated Recommendations
Calibrated Recommendations
Harald Steck
 
Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it! Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it!
Sudeep Das, Ph.D.
 
Learning a Personalized Homepage
Learning a Personalized HomepageLearning a Personalized Homepage
Learning a Personalized Homepage
Justin Basilico
 
Making Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms ReliableMaking Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms Reliable
Justin Basilico
 
Context Aware Recommendations at Netflix
Context Aware Recommendations at NetflixContext Aware Recommendations at Netflix
Context Aware Recommendations at Netflix
Linas Baltrunas
 
Contextualization at Netflix
Contextualization at NetflixContextualization at Netflix
Contextualization at Netflix
Linas Baltrunas
 
Personalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing RecommendationsPersonalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing Recommendations
Justin Basilico
 
Data council SF 2020 Building a Personalized Messaging System at Netflix
Data council SF 2020 Building a Personalized Messaging System at NetflixData council SF 2020 Building a Personalized Messaging System at Netflix
Data council SF 2020 Building a Personalized Messaging System at Netflix
Grace T. Huang
 
Past, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry PerspectivePast, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry Perspective
Justin Basilico
 
Recent Trends in Personalization: A Netflix Perspective
Recent Trends in Personalization: A Netflix PerspectiveRecent Trends in Personalization: A Netflix Perspective
Recent Trends in Personalization: A Netflix Perspective
Justin Basilico
 
Artwork Personalization at Netflix Fernando Amat RecSys2018
Artwork Personalization at Netflix Fernando Amat RecSys2018 Artwork Personalization at Netflix Fernando Amat RecSys2018
Artwork Personalization at Netflix Fernando Amat RecSys2018
Fernando Amat
 
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
 Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se... Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
Sudeep Das, Ph.D.
 
Netflix talk at ML Platform meetup Sep 2019
Netflix talk at ML Platform meetup Sep 2019Netflix talk at ML Platform meetup Sep 2019
Netflix talk at ML Platform meetup Sep 2019
Faisal Siddiqi
 
Lessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at NetflixLessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at Netflix
Justin Basilico
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
MLconf
 
Recommendation at Netflix Scale
Recommendation at Netflix ScaleRecommendation at Netflix Scale
Recommendation at Netflix Scale
Justin Basilico
 
A Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at NetflixA Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at Netflix
Jaya Kawale
 
Recommending for the World
Recommending for the WorldRecommending for the World
Recommending for the World
Yves Raimond
 
Reward Innovation for long-term member satisfaction
Reward Innovation for long-term member satisfactionReward Innovation for long-term member satisfaction
Reward Innovation for long-term member satisfaction
Jiangwei Pan
 
Past, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspectivePast, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspective
Xavier Amatriain
 

Similar to Missing values in recommender models (20)

DATA MINING concepts taught as per ktu syllabus.pdf
DATA MINING concepts taught as per ktu syllabus.pdfDATA MINING concepts taught as per ktu syllabus.pdf
DATA MINING concepts taught as per ktu syllabus.pdf
ShobySunny2
 
DATAMINING_MODdule 2 of Ktu cse semester 8
DATAMINING_MODdule 2 of Ktu cse semester 8DATAMINING_MODdule 2 of Ktu cse semester 8
DATAMINING_MODdule 2 of Ktu cse semester 8
bobbinb2internationa
 
Machine Learning in the Financial Industry
Machine Learning in the Financial IndustryMachine Learning in the Financial Industry
Machine Learning in the Financial Industry
Subrat Panda, PhD
 
Production-Ready BIG ML Workflows - from zero to hero
Production-Ready BIG ML Workflows - from zero to heroProduction-Ready BIG ML Workflows - from zero to hero
Production-Ready BIG ML Workflows - from zero to hero
Daniel Marcous
 
Kaggle Days Paris - Alberto Danese - ML Interpretability
Kaggle Days Paris - Alberto Danese - ML InterpretabilityKaggle Days Paris - Alberto Danese - ML Interpretability
Kaggle Days Paris - Alberto Danese - ML Interpretability
Alberto Danese
 
Unit4_AML_MTech that has many ML concepts covered
Unit4_AML_MTech that has many ML concepts coveredUnit4_AML_MTech that has many ML concepts covered
Unit4_AML_MTech that has many ML concepts covered
Bhumika10033
 
Machine learning4dummies
Machine learning4dummiesMachine learning4dummies
Machine learning4dummies
Michael Winer
 
Machine Learning and Deep Learning 4 dummies
Machine Learning and Deep Learning 4 dummies Machine Learning and Deep Learning 4 dummies
Machine Learning and Deep Learning 4 dummies
Dori Waldman
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
Saad Elbeleidy
 
KNOLX_Data_preprocessing
KNOLX_Data_preprocessingKNOLX_Data_preprocessing
KNOLX_Data_preprocessing
Knoldus Inc.
 
Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017
StampedeCon
 
Pentaho Meeting 2008 - Statistics & BI
Pentaho Meeting 2008 - Statistics & BIPentaho Meeting 2008 - Statistics & BI
Pentaho Meeting 2008 - Statistics & BI
Studio Synthesis
 
Learning to learn Model Behavior: How to use "human-in-the-loop" to explain d...
Learning to learn Model Behavior: How to use "human-in-the-loop" to explain d...Learning to learn Model Behavior: How to use "human-in-the-loop" to explain d...
Learning to learn Model Behavior: How to use "human-in-the-loop" to explain d...
IDEAS - Int'l Data Engineering and Science Association
 
Data reduction
Data reductionData reduction
Data reduction
GowriLatha1
 
Embedded based retrieval in modern search ranking system
Embedded based retrieval in modern search ranking systemEmbedded based retrieval in modern search ranking system
Embedded based retrieval in modern search ranking system
Marsan Ma
 
GLM & GBM in H2O
GLM & GBM in H2OGLM & GBM in H2O
GLM & GBM in H2O
Sri Ambati
 
Recommenders, Topics, and Text
Recommenders, Topics, and TextRecommenders, Topics, and Text
Recommenders, Topics, and Text
NBER
 
13_Data Preprocessing in Python.pptx (1).pdf
13_Data Preprocessing in Python.pptx (1).pdf13_Data Preprocessing in Python.pptx (1).pdf
13_Data Preprocessing in Python.pptx (1).pdf
andreyhapantenda
 
PAISS (PRAIRIE AI Summer School) Digest July 2018
PAISS (PRAIRIE AI Summer School) Digest July 2018 PAISS (PRAIRIE AI Summer School) Digest July 2018
PAISS (PRAIRIE AI Summer School) Digest July 2018
Natalia Díaz Rodríguez
 
VSSML17 Review. Summary Day 2 Sessions
VSSML17 Review. Summary Day 2 SessionsVSSML17 Review. Summary Day 2 Sessions
VSSML17 Review. Summary Day 2 Sessions
BigML, Inc
 
DATA MINING concepts taught as per ktu syllabus.pdf
DATA MINING concepts taught as per ktu syllabus.pdfDATA MINING concepts taught as per ktu syllabus.pdf
DATA MINING concepts taught as per ktu syllabus.pdf
ShobySunny2
 
DATAMINING_MODdule 2 of Ktu cse semester 8
DATAMINING_MODdule 2 of Ktu cse semester 8DATAMINING_MODdule 2 of Ktu cse semester 8
DATAMINING_MODdule 2 of Ktu cse semester 8
bobbinb2internationa
 
Machine Learning in the Financial Industry
Machine Learning in the Financial IndustryMachine Learning in the Financial Industry
Machine Learning in the Financial Industry
Subrat Panda, PhD
 
Production-Ready BIG ML Workflows - from zero to hero
Production-Ready BIG ML Workflows - from zero to heroProduction-Ready BIG ML Workflows - from zero to hero
Production-Ready BIG ML Workflows - from zero to hero
Daniel Marcous
 
Kaggle Days Paris - Alberto Danese - ML Interpretability
Kaggle Days Paris - Alberto Danese - ML InterpretabilityKaggle Days Paris - Alberto Danese - ML Interpretability
Kaggle Days Paris - Alberto Danese - ML Interpretability
Alberto Danese
 
Unit4_AML_MTech that has many ML concepts covered
Unit4_AML_MTech that has many ML concepts coveredUnit4_AML_MTech that has many ML concepts covered
Unit4_AML_MTech that has many ML concepts covered
Bhumika10033
 
Machine learning4dummies
Machine learning4dummiesMachine learning4dummies
Machine learning4dummies
Michael Winer
 
Machine Learning and Deep Learning 4 dummies
Machine Learning and Deep Learning 4 dummies Machine Learning and Deep Learning 4 dummies
Machine Learning and Deep Learning 4 dummies
Dori Waldman
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
Saad Elbeleidy
 
KNOLX_Data_preprocessing
KNOLX_Data_preprocessingKNOLX_Data_preprocessing
KNOLX_Data_preprocessing
Knoldus Inc.
 
Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017
StampedeCon
 
Pentaho Meeting 2008 - Statistics & BI
Pentaho Meeting 2008 - Statistics & BIPentaho Meeting 2008 - Statistics & BI
Pentaho Meeting 2008 - Statistics & BI
Studio Synthesis
 
Embedded based retrieval in modern search ranking system
Embedded based retrieval in modern search ranking systemEmbedded based retrieval in modern search ranking system
Embedded based retrieval in modern search ranking system
Marsan Ma
 
GLM & GBM in H2O
GLM & GBM in H2OGLM & GBM in H2O
GLM & GBM in H2O
Sri Ambati
 
Recommenders, Topics, and Text
Recommenders, Topics, and TextRecommenders, Topics, and Text
Recommenders, Topics, and Text
NBER
 
13_Data Preprocessing in Python.pptx (1).pdf
13_Data Preprocessing in Python.pptx (1).pdf13_Data Preprocessing in Python.pptx (1).pdf
13_Data Preprocessing in Python.pptx (1).pdf
andreyhapantenda
 
PAISS (PRAIRIE AI Summer School) Digest July 2018
PAISS (PRAIRIE AI Summer School) Digest July 2018 PAISS (PRAIRIE AI Summer School) Digest July 2018
PAISS (PRAIRIE AI Summer School) Digest July 2018
Natalia Díaz Rodríguez
 
VSSML17 Review. Summary Day 2 Sessions
VSSML17 Review. Summary Day 2 SessionsVSSML17 Review. Summary Day 2 Sessions
VSSML17 Review. Summary Day 2 Sessions
BigML, Inc
 
Ad

Recently uploaded (20)

Zilliz Cloud Monthly Technical Review: May 2025
Zilliz Cloud Monthly Technical Review: May 2025Zilliz Cloud Monthly Technical Review: May 2025
Zilliz Cloud Monthly Technical Review: May 2025
Zilliz
 
The No-Code Way to Build a Marketing Team with One AI Agent (Download the n8n...
The No-Code Way to Build a Marketing Team with One AI Agent (Download the n8n...The No-Code Way to Build a Marketing Team with One AI Agent (Download the n8n...
The No-Code Way to Build a Marketing Team with One AI Agent (Download the n8n...
SOFTTECHHUB
 
Enterprise Integration Is Dead! Long Live AI-Driven Integration with Apache C...
Enterprise Integration Is Dead! Long Live AI-Driven Integration with Apache C...Enterprise Integration Is Dead! Long Live AI-Driven Integration with Apache C...
Enterprise Integration Is Dead! Long Live AI-Driven Integration with Apache C...
Markus Eisele
 
Mastering Testing in the Modern F&B Landscape
Mastering Testing in the Modern F&B LandscapeMastering Testing in the Modern F&B Landscape
Mastering Testing in the Modern F&B Landscape
marketing943205
 
Slack like a pro: strategies for 10x engineering teams
Slack like a pro: strategies for 10x engineering teamsSlack like a pro: strategies for 10x engineering teams
Slack like a pro: strategies for 10x engineering teams
Nacho Cougil
 
Kit-Works Team Study_팀스터디_김한솔_nuqs_20250509.pdf
Kit-Works Team Study_팀스터디_김한솔_nuqs_20250509.pdfKit-Works Team Study_팀스터디_김한솔_nuqs_20250509.pdf
Kit-Works Team Study_팀스터디_김한솔_nuqs_20250509.pdf
Wonjun Hwang
 
machines-for-woodworking-shops-en-compressed.pdf
machines-for-woodworking-shops-en-compressed.pdfmachines-for-woodworking-shops-en-compressed.pdf
machines-for-woodworking-shops-en-compressed.pdf
AmirStern2
 
AI You Can Trust: The Critical Role of Governance and Quality.pdf
AI You Can Trust: The Critical Role of Governance and Quality.pdfAI You Can Trust: The Critical Role of Governance and Quality.pdf
AI You Can Trust: The Critical Role of Governance and Quality.pdf
Precisely
 
AI 3-in-1: Agents, RAG, and Local Models - Brent Laster
AI 3-in-1: Agents, RAG, and Local Models - Brent LasterAI 3-in-1: Agents, RAG, and Local Models - Brent Laster
AI 3-in-1: Agents, RAG, and Local Models - Brent Laster
All Things Open
 
On-Device or Remote? On the Energy Efficiency of Fetching LLM-Generated Conte...
On-Device or Remote? On the Energy Efficiency of Fetching LLM-Generated Conte...On-Device or Remote? On the Energy Efficiency of Fetching LLM-Generated Conte...
On-Device or Remote? On the Energy Efficiency of Fetching LLM-Generated Conte...
Ivano Malavolta
 
Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...
Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...
Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...
Raffi Khatchadourian
 
The Future of Cisco Cloud Security: Innovations and AI Integration
The Future of Cisco Cloud Security: Innovations and AI IntegrationThe Future of Cisco Cloud Security: Innovations and AI Integration
The Future of Cisco Cloud Security: Innovations and AI Integration
Re-solution Data Ltd
 
Config 2025 presentation recap covering both days
Config 2025 presentation recap covering both daysConfig 2025 presentation recap covering both days
Config 2025 presentation recap covering both days
TrishAntoni1
 
Cybersecurity Threat Vectors and Mitigation
Cybersecurity Threat Vectors and MitigationCybersecurity Threat Vectors and Mitigation
Cybersecurity Threat Vectors and Mitigation
VICTOR MAESTRE RAMIREZ
 
AI Agents at Work: UiPath, Maestro & the Future of Documents
AI Agents at Work: UiPath, Maestro & the Future of DocumentsAI Agents at Work: UiPath, Maestro & the Future of Documents
AI Agents at Work: UiPath, Maestro & the Future of Documents
UiPathCommunity
 
UiPath Automation Suite – Cas d'usage d'une NGO internationale basée à Genève
UiPath Automation Suite – Cas d'usage d'une NGO internationale basée à GenèveUiPath Automation Suite – Cas d'usage d'une NGO internationale basée à Genève
UiPath Automation Suite – Cas d'usage d'une NGO internationale basée à Genève
UiPathCommunity
 
AsyncAPI v3 : Streamlining Event-Driven API Design
AsyncAPI v3 : Streamlining Event-Driven API DesignAsyncAPI v3 : Streamlining Event-Driven API Design
AsyncAPI v3 : Streamlining Event-Driven API Design
leonid54
 
RTP Over QUIC: An Interesting Opportunity Or Wasted Time?
RTP Over QUIC: An Interesting Opportunity Or Wasted Time?RTP Over QUIC: An Interesting Opportunity Or Wasted Time?
RTP Over QUIC: An Interesting Opportunity Or Wasted Time?
Lorenzo Miniero
 
Bepents tech services - a premier cybersecurity consulting firm
Bepents tech services - a premier cybersecurity consulting firmBepents tech services - a premier cybersecurity consulting firm
Bepents tech services - a premier cybersecurity consulting firm
Benard76
 
GyrusAI - Broadcasting & Streaming Applications Driven by AI and ML
GyrusAI - Broadcasting & Streaming Applications Driven by AI and MLGyrusAI - Broadcasting & Streaming Applications Driven by AI and ML
GyrusAI - Broadcasting & Streaming Applications Driven by AI and ML
Gyrus AI
 
Zilliz Cloud Monthly Technical Review: May 2025
Zilliz Cloud Monthly Technical Review: May 2025Zilliz Cloud Monthly Technical Review: May 2025
Zilliz Cloud Monthly Technical Review: May 2025
Zilliz
 
The No-Code Way to Build a Marketing Team with One AI Agent (Download the n8n...
The No-Code Way to Build a Marketing Team with One AI Agent (Download the n8n...The No-Code Way to Build a Marketing Team with One AI Agent (Download the n8n...
The No-Code Way to Build a Marketing Team with One AI Agent (Download the n8n...
SOFTTECHHUB
 
Enterprise Integration Is Dead! Long Live AI-Driven Integration with Apache C...
Enterprise Integration Is Dead! Long Live AI-Driven Integration with Apache C...Enterprise Integration Is Dead! Long Live AI-Driven Integration with Apache C...
Enterprise Integration Is Dead! Long Live AI-Driven Integration with Apache C...
Markus Eisele
 
Mastering Testing in the Modern F&B Landscape
Mastering Testing in the Modern F&B LandscapeMastering Testing in the Modern F&B Landscape
Mastering Testing in the Modern F&B Landscape
marketing943205
 
Slack like a pro: strategies for 10x engineering teams
Slack like a pro: strategies for 10x engineering teamsSlack like a pro: strategies for 10x engineering teams
Slack like a pro: strategies for 10x engineering teams
Nacho Cougil
 
Kit-Works Team Study_팀스터디_김한솔_nuqs_20250509.pdf
Kit-Works Team Study_팀스터디_김한솔_nuqs_20250509.pdfKit-Works Team Study_팀스터디_김한솔_nuqs_20250509.pdf
Kit-Works Team Study_팀스터디_김한솔_nuqs_20250509.pdf
Wonjun Hwang
 
machines-for-woodworking-shops-en-compressed.pdf
machines-for-woodworking-shops-en-compressed.pdfmachines-for-woodworking-shops-en-compressed.pdf
machines-for-woodworking-shops-en-compressed.pdf
AmirStern2
 
AI You Can Trust: The Critical Role of Governance and Quality.pdf
AI You Can Trust: The Critical Role of Governance and Quality.pdfAI You Can Trust: The Critical Role of Governance and Quality.pdf
AI You Can Trust: The Critical Role of Governance and Quality.pdf
Precisely
 
AI 3-in-1: Agents, RAG, and Local Models - Brent Laster
AI 3-in-1: Agents, RAG, and Local Models - Brent LasterAI 3-in-1: Agents, RAG, and Local Models - Brent Laster
AI 3-in-1: Agents, RAG, and Local Models - Brent Laster
All Things Open
 
On-Device or Remote? On the Energy Efficiency of Fetching LLM-Generated Conte...
On-Device or Remote? On the Energy Efficiency of Fetching LLM-Generated Conte...On-Device or Remote? On the Energy Efficiency of Fetching LLM-Generated Conte...
On-Device or Remote? On the Energy Efficiency of Fetching LLM-Generated Conte...
Ivano Malavolta
 
Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...
Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...
Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...
Raffi Khatchadourian
 
The Future of Cisco Cloud Security: Innovations and AI Integration
The Future of Cisco Cloud Security: Innovations and AI IntegrationThe Future of Cisco Cloud Security: Innovations and AI Integration
The Future of Cisco Cloud Security: Innovations and AI Integration
Re-solution Data Ltd
 
Config 2025 presentation recap covering both days
Config 2025 presentation recap covering both daysConfig 2025 presentation recap covering both days
Config 2025 presentation recap covering both days
TrishAntoni1
 
Cybersecurity Threat Vectors and Mitigation
Cybersecurity Threat Vectors and MitigationCybersecurity Threat Vectors and Mitigation
Cybersecurity Threat Vectors and Mitigation
VICTOR MAESTRE RAMIREZ
 
AI Agents at Work: UiPath, Maestro & the Future of Documents
AI Agents at Work: UiPath, Maestro & the Future of DocumentsAI Agents at Work: UiPath, Maestro & the Future of Documents
AI Agents at Work: UiPath, Maestro & the Future of Documents
UiPathCommunity
 
UiPath Automation Suite – Cas d'usage d'une NGO internationale basée à Genève
UiPath Automation Suite – Cas d'usage d'une NGO internationale basée à GenèveUiPath Automation Suite – Cas d'usage d'une NGO internationale basée à Genève
UiPath Automation Suite – Cas d'usage d'une NGO internationale basée à Genève
UiPathCommunity
 
AsyncAPI v3 : Streamlining Event-Driven API Design
AsyncAPI v3 : Streamlining Event-Driven API DesignAsyncAPI v3 : Streamlining Event-Driven API Design
AsyncAPI v3 : Streamlining Event-Driven API Design
leonid54
 
RTP Over QUIC: An Interesting Opportunity Or Wasted Time?
RTP Over QUIC: An Interesting Opportunity Or Wasted Time?RTP Over QUIC: An Interesting Opportunity Or Wasted Time?
RTP Over QUIC: An Interesting Opportunity Or Wasted Time?
Lorenzo Miniero
 
Bepents tech services - a premier cybersecurity consulting firm
Bepents tech services - a premier cybersecurity consulting firmBepents tech services - a premier cybersecurity consulting firm
Bepents tech services - a premier cybersecurity consulting firm
Benard76
 
GyrusAI - Broadcasting & Streaming Applications Driven by AI and ML
GyrusAI - Broadcasting & Streaming Applications Driven by AI and MLGyrusAI - Broadcasting & Streaming Applications Driven by AI and ML
GyrusAI - Broadcasting & Streaming Applications Driven by AI and ML
Gyrus AI
 
Ad

Missing values in recommender models

  • 1. Parmeshwar Khurd, Ehsan Saberian & Maryam Esmaeili ML Platform Meetup, 6/20/2019 Missing Values in Recommender Models
  • 2. Talk Outline ● Problem Statement: Missing Features in Recommender Systems (RS) ● Handling Missing Features in GBDTs ● Handling Missing Features in NNs ● Conclusion
  • 4. ● Scientists and engineers in the mathematical sciences have historically dealt with the problem of missing observations for a long time ● Typical patterns in physics: a. Astronomers fill in missing observations for orbits via least-squares: Ceres example b. Models to explain all observations including missing / future ones i. Physicist proposes a new model explaining past observations that previous models cannot adequately explain ii. She realizes new model predicts events for which past observations do not exist iii. New observations are collected to validate new model Missing Observations vs. Missing Features
  • 5. Physics Example from 100 Years Ago ● Einstein proposed general relativity model for gravitation in 1915, an improvement over Newtonian models, with two striking examples: ○ It better explained known observed shifts in perihelion (closest point to Sun) of Mercury’s orbit ○ It predicted as yet unmeasured bending of light from distant stars, e.g., during solar eclipse, bending ~ 1.75 arc-seconds, twice Newtonian prediction. Arms race to validate experimentally: Eddington succeeded in May 1919
  • 6. Non-parametric/ big-data Correlational Models ● We have already talked about several complex models: ○ Correlational: Assume time-dependent elliptical functional form for planetary orbit, fit/regress parameters assuming normal noise to fill in missing past coordinates and predict future motion ○ Causal: Newton or Einstein’s general causal models for gravitation PDEs for planetary motion functional forms of orbits / perihelion shifts and suggested new observations no one had thought to measure ● In rest of talk, we focus on correlational models, but they are statistical and more complex: ○ trained on more data (both more features and samples) ○ non-parametric (decision trees) or many parameters (neural networks) ● But observation not missing, only a part of it: ○ incomplete observation is called observation with missing data ○ if input is incomplete, it is an observation with missing features
  • 7. Improving correlational ML Models in RS ● Given context, predictive ML model in recommender system (RS) needs to match users with items they might enjoy ● Thankfully, as ML engineers in the recommendation space, we need less creativity and labor than Einstein / Eddington to improve models ● In supervised ML models, we can time-travel our (features, labels) to see if our newer predictive models improve performance on historical offline metrics [Netflix Delorean ML blog] ● Model improvements come from leveraging ○ business information (more appropriate metrics or inputs) ○ ML models: BERT, CatBoost, Factorization Machines, etc.
  • 8. Problem of Missing Data in RS - I ● ML models in RS need to deal with missing data patterns for cases such as: ○ New users ○ New contexts (e.g., country, time-zone, language, device, row-type) ○ New items ○ Timeouts and failures in data microservices ○ Modeling causal impact of recommendations ○ Intent-to-treat ● Unfortunately, last two problems similar to Einstein/Eddington example: Solutions involve causal models / contextual bandits and discussed elsewhere [Netflix talk] ● Not handling missing labels: Optimizing RS for longer-term reward (label) a harder problem [Netflix talk]
  • 9. Problem of Missing Data in RS - II Guiding principles in this talk for RS cold-start + other correlational missing feature problems ● Let ML models handle missing values rather than imputing and/or adding features (via models or simple statistics) ○ Both GBDTs and NNs allow this ● ML models generally better at interpolation than extrapolation ○ Many past examples of service handling new users, items and contexts ○ For robust extrapolation during timeouts or data service failures, add simulated examples in training and/or impose feature monotonicity constraints
  • 10. New Users - I ● New users join Netflix every minute
  • 11. New Users - II ● We get some taste information in the sign-up flow ● But clearly, we don’t know enough (what have they watched elsewhere, broader tastes, etc.) to personalize well ● Rather than try to extrapolate into the past, personalize progressively better as they interact with our service
  • 12. New Contexts ● ML models in search / recommender systems need to respect user language choice ● As new languages are supported, these choices will grow
  • 13. New Items - I ● New items are added to the Netflix service every day SNL
  • 14. New items - II ● New items miss any features based on engagement data ● “Coming Soon” tab shows trailers ○ This tab needs a personalized ranker as well
  • 16. GBDT for RS ● Several packages to train GBDTs: XGBoost, R’s GBM, CatBoost, LightGBM, Cognitive Foundry, sklearn, etc. ● XGBoost won several structured data Kaggle competitions ● Netflix talk on fast scoring of XGBoost models ● Dwell-time for Yahoo homepage recommender (RecSys 2014 Best Paper) Source: XGBoost
  • 17. (S)GBDT Background - I Training Stochastic Gradient Boosted Decision Trees (SGBDTs) for (logistic) loss minimization consists of one main algorithm (greedily learn ensemble) and two sub-algorithms (learn individual tree, learn split at each node of tree) : Learn leaf coefficient by one iteration of Newton-Raphson Get gradient of (logistic) loss per example w.r.t. current ensemble Learn tree structure
  • 18. (S)GBDT Background - II Learn left and right trees recursively Find best split via variance reduction
  • 19. Missing Value Handling w GBDTs: Taxonomy ● ESL-II, (Section 9.6) mentions 3 ways to handle missing values: ○ Discard observations with any missing values ○ Impute all missing values before training via models or simple statistics : Item popularities may be initialized randomly or to zero or via weighted averaging, where weights may indicate similarity determined via meta-data ○ Rely on the learning algorithm to deal with missing values in its training phase via surrogate splits non-strict usage in tree: ■ Categoricals can include one more “missing” category ■ Continuous / categorical: ● Send example left or right for missing value appropriately (XGBoost) ● Use ternary split with missing branch (R’s GBM)
  • 20. Missing Value Handling w R’s GBM ● Use ternary split with missing branch: ○ Weighted variance reduction in Best-Split algorithm updated to include missing variance
  • 21. Missing Value Handling w XGBoost Always send example left or right for missing value appropriately: ● Evaluate best threshold and variance reduction in Best-Split algorithm from sending missing values left or right (post-hoc) and then pick better choice
  • 23. Recurrent Neural Network (NN) for RS ● Youtube latent cross recurrent NN, WSDM 2018 ● Trained with TensorFlow/Keras ○ Other options include PyTorch, MxNet, CNTK, etc.
  • 24. Missing Value Handling w NNs: Taxonomy ● Similar taxonomy as in the case of GBDTs ○ Discard observations with any missing values ■ Dropout: Drop connections w missing values, scale up others ○ Impute all missing values before training via models or simple statistics: Item embeddings may be initialized randomly or to zeros or via weighted averaging, where weights may indicate similarity determined via meta-data ○ Rely on the learning algorithm to deal with missing values in its training phase via hidden layers ■ Categoricals: Single “missing” item hidden embedding or DropoutNet (NIPS17) ■ Continuous / Categorical: Impute continuous + include “missing” embedding or Hidden layer reaches (NIPS18) “average” for missing feature or item
  • 25. Missing Value Handling w DropoutNet Auto-encoder with item/user-vec randomly retained or set to zero/average
  • 26. Missing Value Handling w Hidden “Average” Partly closed-form “average” for missing GMM first hidden layer activation
  • 27. ● A variety of ways to handle missing values in recommender models ● Only presented subset of approaches that do not modify / impute inputs and treat missing values within training algorithm ● Optimal approach for a problem likely dataset-dependent ! Conclusion
  • 28. ● How Gauss determined the orbit of Ceres, J. Tennenbaum, et al. ● Why beauty is truth: a history of symmetry, I. Stewart ● MAY 29, 1919: A MAJOR ECLIPSE, RELATIVELY SPEAKING, L. Buchen, Wired ● Delorean, H. Taghavi, et al., Netflix ● Bandits for Recommendations, J. Kawale, et al., Netflix ● Longer-term outcomes, B. Rostykus et al., Netflix ● Speeding up XGBoost Scoring, D. Parekh, et al., Netflix ● Beyond clicks: Dwell-time for personalization, X Yi, et al. ● Latent Cross: Making Use of Context in Recurrent Recommender Systems, Beutel, et al. ● ESL-II: Elements of Statistical Learning, Hastie, Tibshirani, Friedman ● R GBM ● Xgboost ● Processing of missing data via neural networks, Smieja, et al. ● DropoutNet: Addressing Cold Start in Recommender Systems, Volkovs et al. ● Inference and missing data. Biometrika, 63, 581–592, Rubin, et al. References
  • 29. Acknowledgments The presenters wish to thank J. Basilico, H. Taghavi, Y. Raimond, S. Das, J. Kim, A. Deoras, C. Alvino and several others for discussions and contributions Thank You !
  翻译: