SlideShare a Scribd company logo
L-SHAPLEY and C-SHAPLEY: Efficient
INTERPRETATION FOR STRUCTURED DATA
Kyoto University
Daiki Tanaka
Background
• Although many black-box ML models, such as
RandomForest, NN or kernel-methods, can produce highly
accurate prediction, but such predictions are lack of
interpretability.
1. Luck of interpretation is a crucial issue when black-box
models are applied in areas such as medicine, financial
markets, and criminal justice.
2. To be able to know the model’s reasoning is the good
way to improve the model.
Background : There are kinds of approaches for
interpreting models.
• Model-specific interpretation or Model-agnostic interpretation
• Model-specific : making some assumptions to models. (e.g. methods based on
attention weights, or gradient based method like smooth-grad, grad-CAM, …)
• Model-agnostic : making no assumption to models. (e.g. LIME, or Shapley
value)
• Model-level interpretation or Instance-wise interpretation
• Instance-wise : yielding feature importances for each input instance (e.g.
saliency map).
• Model-level : yielding feature importances for the whole model. (e.g. weights
of Logistic Regression, or decision rules of decision tree)
This study focuses on Model-Agnostic & Instance-wise interpretation.
Problem Setting : Model-Agnostic & Instance-wise
interpretation
• Input :
• An Instance
• A predictive model
• Output :
• A vector of importance scores of the feature.
• Indicating which features are the key for the model
to make its prediction on that instance.
Interpretation
method
Instance
Model
Importance
scores
Making no assumptions on model
Related work : Shapley value
• The Shapley value is an idea in the field of cooperative
game theory.
• This was originally proposed as a characterization of a fair
distribution of a total profit from all the players.
Person:A Person:B Person:C Profit
0 0 1 5
0 1 0 30
0 1 1 50
1 0 0 40
1 0 1 55
1 1 0 75
1 1 1 100
BA C
Related work : Shapley value
• The Shapley value of is defined as :
• is the set including all players. (ex. )
• is the function which returns the “profit” by the set .
• considers the contribution of the element .
• means the number of ways for selecting -sized subsets.
i
ϕ(i) =
1
N ∑
S⊆N{i}
1
(
|N| − 1
|S| )
(v(S ∪ {i}) − v(S))
N N = {person A, person B, person C}
v(S) S
v(S ∪ {i}) − v(S) i
(
|N| − 1
|S| )
|S|
Related work : Example
of Shapley value
• The Shapley value of person A is defined as :
ϕ(A) =
1
3 ∑
S⊆{A,B,C}{A}
1
(
|N| − 1
|S| )
(v(S ∪ {A}) − v(S))
=
1
3
(
1
1
(100 − 50) +
1
2
(55 − 5) +
1
2
(75 − 30))
Person:A Person:B Person:C Profit
0 0 1 5
0 1 0 30
0 1 1 50
1 0 0 40
1 0 1 55
1 1 0 75
1 1 1 100
Related work : Shapley value
• The Shapley value can be applied to predictive models.
• Each feature is seen as a player in the underlying game.
• Issue : Each evaluation of the Shapley value requires exponential number of
model evaluations. There are two kinds of approaches to deal with the problem.
• Approach1 : Sampling based method
• Randomly sampling feature subsets
• Approach2 : Regression based method
• Sampling feature subsets based on a weighted kernel, and carrying out a
linear regression to estimate Shapley value
Notation
• Feature Vector :
• Note that is the dimension of feature vectors.
• Set of features :
• Sub-vector of features :
• Output variables :
• Output vector of a model given an input vector :
x ∈ 𝒳 ⊂ ℝd
d
S ⊂ {1,2,…, d}
xs = {xj, j ∈ S}
y ∈ 𝒴
x ℙm(Y|x)
Preliminaries : Importance of a feature set
• Here, Importance score of feature set is introduced as:
• Where denotes the expectation over .
• The more similar the prediction produced by to the
prediction produced by , the higher becomes.
S
𝔼m[ ⋅ |x] ℙm( ⋅ |x)
xS
x vx(S)
vx(S) = 𝔼m[
log ℙm(Y|xs) ∣ x
]
Preliminaries : Importance of a feature set
• In many cases, class-specific importance is favored.
• How important is a feature set S to the predicted class?
• Here, following degenerate conditional distribution is
introduced.
• We can then define the importance of a subset S with
respect to using the modified score, which is the expected log
probability of the predicted class.
̂ℙm
̂ℙm(y|x) =
{
1 if y ∈ arg maxy′ℙm(y′|x)
0 otherwise .
vx(S) = ̂𝔼m[
log ℙm(Y|xs) ∣ x
]
Preliminaries : measuring interaction between features
• Consider quantifying the importance of a given -th feature for feature vector .
• A naive way is to compute the importance of set : .
• But it ignores interactions between features.
• For example, when performing sentiment analysis on the following sentence.
This movie is not heartwarming or entertaining.
• Then we wish to quantify the the importance of feature “not”, which plays an
important role in the sentence as being classified as negative.
• But one would expect that , because “not” itself has neither
negative nor positive sentiment.
i x
{i} Vx({i})
Vx({not}) ≈ 0
Preliminaries : marginal contributions of a feature
• It is essential to consider the interactions of a given feature with other
features.
• A natural way to assess how feature interacts with other features is to
compute difference between the importance of all features in S,
with and without .
• This difference is called marginal contribution of to S, and given by :
• To obtain a simple scaler measure for , we need to aggregate these
marginal contributions over all subsets S including .
• The Shapley value is one way to do so.
i
i
i
i
i
i
mx(S, i) := vx(S) − vx(Si)
Preliminaries : Shapley value
• For k = 1,2,…,d, we let denote the set of k-sized feature subsets
that contain feature .
• The Shapley value is obtained by averaging the marginal
contributions.
• First over the set for a fixed k.
• Then over all possible choices of set size k.
Sk(i)
i
Sk(i)
ϕx(i) =
1
d
d
∑
k=1
1
(
d − 1
k − 1)
∑
S∈Sk(i)
mx(S, i)
Set of k-sized feature sets including feature i
Challenge with computing Shapley value
• The exact computation of the Shapley value leads to
computational difficulties.
• We need to calculate marginal contributions for subsets.
• There are some sampling-based approaches to deal with the
problem.
• But such approaches suffer from high variance when the
number of samples to be collected per instance is limited.
2d−1
ϕx(i) =
1
d
d
∑
k=1
1
(
d − 1
k − 1)
∑
S∈Sk(i)
mx(S, i) =
∑
S∋i,S⊆[d]
1
(
d − 1
|S| − 1)
mx(S, i)
Proposed method
Relaxing computational difficulties of Shapley-value
Key idea : features can be seen as nodes of a
graph, and they have some relationship.
• In many applications, features can be considered as nodes of a
graph, and we can define distances between pairs of features
based on the graph structure.
• Features distant in the graph have weak interactions each
other.
• For example, an image is modeled with a grid graph. Pixels
that are far apart may have little effect on each other in the
computation of Shapley value.
• Or, a text is represented as a line graph.
This is A pen
Proposed method : preliminary
• We are given feature vector
• Then we let denote a connected graph
• Each feature is assigned with node .
• Edges represent the interactions between features.
• The graph induces a following distance function on .
• For a given node , its k-neighborhood is the set :
x ∈ ℝd
G = (V, E)
i i
V × V
i ∈ V
𝒩k(i) := {j ∈ V|dG(i, j) ≤ k}
dG(l, m) = the number of edges in shortest path joining l to m .
Gray area is an example of .𝒩2(i)
Proposed method1 : Local-Shapley (L-Shapley)
• Definition1: Given a model , a sample , and a feature ,
the L-Shapley estimate of order k on a graph is given by :
ℙm x i
G
̂ϕk
x(i) =
1
| 𝒩k(i)| ∑
T∋i,T⊆Nk(i)
1
(
|Nk(i)| − 1
|T| − 1 )
mx(T, i)
k-neighborhoods of i
original Shapley value : ϕx(i) =
∑
S∋i,S⊆[d]
1
(
d − 1
|S| − 1)
mx(S, i)
Proposed method2 : Connected-Shapley (C-Shapley)
• Definition2: Given a model , a sample , and a feature ,
the C-Shapley estimate of order k on a graph is given by :
where denotes the set of all subsets of that
contain node , and nodes that are connected in .
ℙm x i
G
Ck(i) 𝒩k(i)
i G
̂ϕk
x(i) =
∑
U∈Ck(i)
2
(|U| + 2)(|U| + 1)|U|
mx(U, i)
original Shapley value : ϕx(i) =
∑
S∋i,S⊆[d]
1
(
d − 1
|S| − 1)
mx(S, i)
Examples
• Left subset (blue and red) is summed over in L-Shapely, not
in C-Shapley.
• Right subset (blue and red) is summed over in both L-
Shapley and C-Shapley.
Properties : Error between L-Shapley value
and true Shapley value is upper-bounded.
• S is the subset of k-nearest features of i.
• is the sub-vector having k-nearest features of i.
• is the sub-vector having features not included by S.
XU
XV
Properties : Error between C-Shapley value
and true Shapley value is upper-bounded.
• S is the subset of k-nearest features of i.
• is the connected subset in S.U ∋ i
Experiments
Experiments : tasks and baselines
• Tasks : image classification, and text classification
• Baselines : model agnostic methods
• KernelSHAP : regression based approximation of Shapley
• SampleShapley : Random sampling based approximation
of Shapley
• LIME : model agnostic interpretation method linearly
approximating the black-box function around the target
feature.
Experiments : evaluation method
• Evaluation method :
• The change in log-odds scores of the predicted class
before and after masking the top features ranked by
importance scores, where masked words are replaced by
zero paddings.
• Larger decreasing of log-odds means that importance
scores by algorithm could correctly capture importance
of features.
Experiment(1/3) : Text classification
• We study the performances on three neural models and
three datasets:
Dataset Task Method Accuracy
IMDB-Review Sentiment classification Word-based CNN 90.1%
AG news Category classification Character-Based
CNN
90.09%
Yahoo! Answers Category classification LSTM 70.84%
Experiment(1/3) : result
• On IMDB with Word-CNN, the simplest model among the
three, L-Shapley achieves the best performance while LIME,
KernelSHAP and C-Shapley achieve slightly worse
performance.
Better
Experiment(2/3) : Image classification
• Datasets :
• A subset of MNIST : Only “3" and “8” are included.
• A subset of CIFAR-10 : Only deers(鹿) and horses(馬) are
included.
Experiment(2/3) : Result
• C-Shapley outperform other methods in both datasets.
Better
Experiment(2/3) : Examples for misclassified
images
• Above image is “3”, and below image is
“8”. They are misclassified into “8”, and
“3”, respectively.
• The masked pixels are colored with red if
activated, (white) and blue otherwise.
• The result seems to show the “reasoning”
of the classifier.
Experiment(3/3) : Evaluation by human
subjects (5 people)
• They use Amazon Mechanical Turk to compare L-Shapley, C-
Shapley and KernelSHAP on IMDB movie reviews (200 reviews).
• Experimental purpose :
• Are humans able to make a decision with top words alone?
• Are humans unable to make a decision with top words masked?
• They ask subjects to classify the sentiment of texts into five
categories : strongly positive (+2), positive (+1), neutral (0),
negative (-1), strongly negative (-2).
Experiment(3/3) : Evaluation by human
subjects (5 people)
• Texts have three types :
1. raw reviews
2. top 10 words of each review ranked by L-Shapley, C-Shapley and
KernelSHAP
3. reviews with top words being masked
• Masked words are produced by the L-Shapley, C-Shapley, and
KernelSHAP until the probability score of the correct class produced by
the model is lower than 10%.
• Around 14.6% of words in each review are masked for L-Shapley and C-
Shapley, and 31.6% for KernelSHAP.
Experiment(3/3) : Evaluation by human
subjects (5 people)
• Evaluation metrics :
• Consistency (0 or 1) between true labels and labels from
human subjects.
• Standard deviation of scores on each reviews
• This is as a measure of disagreement between humans.
• The absolute value of the averaged scores.
• This is as a measure of confidence of decision.
Experiment(3/3) : result
• Humans become more consistent and confident when they are presented
with top words. On the other hand, when top words are masked, humans are
easier to make mistakes and are less certain.
• C-Shapley yields the highest performance in terms of consistency, agreement,
and confidence.
• L-Shapley harms the human judgement the most among the three algorithms.
Conclusion
• They have proposed two algorithms; L-Shapley and C-
Shapley for instance-wise and model-agnostic
interpretation.
• They demonstrated the superior performance of these
algorithms compared to other methods on black-box
models in both text and image classification with both
quantitative metrics and human evaluation.
Ad

More Related Content

What's hot (20)

Foundations: Artificial Neural Networks
Foundations: Artificial Neural NetworksFoundations: Artificial Neural Networks
Foundations: Artificial Neural Networks
ananth
 
Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial
Alexandros Karatzoglou
 
Skip RNN: Learning to Skip State Updates in Recurrent Neural Networks
Skip RNN: Learning to Skip State Updates in Recurrent Neural NetworksSkip RNN: Learning to Skip State Updates in Recurrent Neural Networks
Skip RNN: Learning to Skip State Updates in Recurrent Neural Networks
Universitat Politècnica de Catalunya
 
Convolutional Neural Networks (DLAI D5L1 2017 UPC Deep Learning for Artificia...
Convolutional Neural Networks (DLAI D5L1 2017 UPC Deep Learning for Artificia...Convolutional Neural Networks (DLAI D5L1 2017 UPC Deep Learning for Artificia...
Convolutional Neural Networks (DLAI D5L1 2017 UPC Deep Learning for Artificia...
Universitat Politècnica de Catalunya
 
Seq2Seq (encoder decoder) model
Seq2Seq (encoder decoder) modelSeq2Seq (encoder decoder) model
Seq2Seq (encoder decoder) model
佳蓉 倪
 
RNN & LSTM: Neural Network for Sequential Data
RNN & LSTM: Neural Network for Sequential DataRNN & LSTM: Neural Network for Sequential Data
RNN & LSTM: Neural Network for Sequential Data
Yao-Chieh Hu
 
Deep learning and image analytics using Python by Dr Sanparit
Deep learning and image analytics using Python by Dr SanparitDeep learning and image analytics using Python by Dr Sanparit
Deep learning and image analytics using Python by Dr Sanparit
BAINIDA
 
The Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intelligence)
The Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intelligence)The Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intelligence)
The Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intelligence)
Universitat Politècnica de Catalunya
 
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
Universitat Politècnica de Catalunya
 
Deep Learning for Computer Vision: Deep Networks (UPC 2016)
Deep Learning for Computer Vision: Deep Networks (UPC 2016)Deep Learning for Computer Vision: Deep Networks (UPC 2016)
Deep Learning for Computer Vision: Deep Networks (UPC 2016)
Universitat Politècnica de Catalunya
 
Introduction For seq2seq(sequence to sequence) and RNN
Introduction For seq2seq(sequence to sequence) and RNNIntroduction For seq2seq(sequence to sequence) and RNN
Introduction For seq2seq(sequence to sequence) and RNN
Hye-min Ahn
 
Overview of TensorFlow For Natural Language Processing
Overview of TensorFlow For Natural Language ProcessingOverview of TensorFlow For Natural Language Processing
Overview of TensorFlow For Natural Language Processing
ananth
 
GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...
GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...
GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...
Balázs Hidasi
 
Parallel Recurrent Neural Network Architectures for Feature-rich Session-base...
Parallel Recurrent Neural Network Architectures for Feature-rich Session-base...Parallel Recurrent Neural Network Architectures for Feature-rich Session-base...
Parallel Recurrent Neural Network Architectures for Feature-rich Session-base...
Balázs Hidasi
 
Recurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRURecurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRU
ananth
 
The Munich LSTM-RNN Approach to the MediaEval 2014 “Emotion in Music” Task
The Munich LSTM-RNN Approach to the MediaEval 2014 “Emotion in Music” TaskThe Munich LSTM-RNN Approach to the MediaEval 2014 “Emotion in Music” Task
The Munich LSTM-RNN Approach to the MediaEval 2014 “Emotion in Music” Task
multimediaeval
 
Recent Progress in RNN and NLP
Recent Progress in RNN and NLPRecent Progress in RNN and NLP
Recent Progress in RNN and NLP
hytae
 
Face recognition and deep learning โดย ดร. สรรพฤทธิ์ มฤคทัต NECTEC
Face recognition and deep learning  โดย ดร. สรรพฤทธิ์ มฤคทัต NECTECFace recognition and deep learning  โดย ดร. สรรพฤทธิ์ มฤคทัต NECTEC
Face recognition and deep learning โดย ดร. สรรพฤทธิ์ มฤคทัต NECTEC
BAINIDA
 
TypeScript and Deep Learning
TypeScript and Deep LearningTypeScript and Deep Learning
TypeScript and Deep Learning
Oswald Campesato
 
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
MLconf
 
Foundations: Artificial Neural Networks
Foundations: Artificial Neural NetworksFoundations: Artificial Neural Networks
Foundations: Artificial Neural Networks
ananth
 
Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial
Alexandros Karatzoglou
 
Skip RNN: Learning to Skip State Updates in Recurrent Neural Networks
Skip RNN: Learning to Skip State Updates in Recurrent Neural NetworksSkip RNN: Learning to Skip State Updates in Recurrent Neural Networks
Skip RNN: Learning to Skip State Updates in Recurrent Neural Networks
Universitat Politècnica de Catalunya
 
Convolutional Neural Networks (DLAI D5L1 2017 UPC Deep Learning for Artificia...
Convolutional Neural Networks (DLAI D5L1 2017 UPC Deep Learning for Artificia...Convolutional Neural Networks (DLAI D5L1 2017 UPC Deep Learning for Artificia...
Convolutional Neural Networks (DLAI D5L1 2017 UPC Deep Learning for Artificia...
Universitat Politècnica de Catalunya
 
Seq2Seq (encoder decoder) model
Seq2Seq (encoder decoder) modelSeq2Seq (encoder decoder) model
Seq2Seq (encoder decoder) model
佳蓉 倪
 
RNN & LSTM: Neural Network for Sequential Data
RNN & LSTM: Neural Network for Sequential DataRNN & LSTM: Neural Network for Sequential Data
RNN & LSTM: Neural Network for Sequential Data
Yao-Chieh Hu
 
Deep learning and image analytics using Python by Dr Sanparit
Deep learning and image analytics using Python by Dr SanparitDeep learning and image analytics using Python by Dr Sanparit
Deep learning and image analytics using Python by Dr Sanparit
BAINIDA
 
The Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intelligence)
The Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intelligence)The Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intelligence)
The Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intelligence)
Universitat Politècnica de Catalunya
 
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
Universitat Politècnica de Catalunya
 
Introduction For seq2seq(sequence to sequence) and RNN
Introduction For seq2seq(sequence to sequence) and RNNIntroduction For seq2seq(sequence to sequence) and RNN
Introduction For seq2seq(sequence to sequence) and RNN
Hye-min Ahn
 
Overview of TensorFlow For Natural Language Processing
Overview of TensorFlow For Natural Language ProcessingOverview of TensorFlow For Natural Language Processing
Overview of TensorFlow For Natural Language Processing
ananth
 
GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...
GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...
GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...
Balázs Hidasi
 
Parallel Recurrent Neural Network Architectures for Feature-rich Session-base...
Parallel Recurrent Neural Network Architectures for Feature-rich Session-base...Parallel Recurrent Neural Network Architectures for Feature-rich Session-base...
Parallel Recurrent Neural Network Architectures for Feature-rich Session-base...
Balázs Hidasi
 
Recurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRURecurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRU
ananth
 
The Munich LSTM-RNN Approach to the MediaEval 2014 “Emotion in Music” Task
The Munich LSTM-RNN Approach to the MediaEval 2014 “Emotion in Music” TaskThe Munich LSTM-RNN Approach to the MediaEval 2014 “Emotion in Music” Task
The Munich LSTM-RNN Approach to the MediaEval 2014 “Emotion in Music” Task
multimediaeval
 
Recent Progress in RNN and NLP
Recent Progress in RNN and NLPRecent Progress in RNN and NLP
Recent Progress in RNN and NLP
hytae
 
Face recognition and deep learning โดย ดร. สรรพฤทธิ์ มฤคทัต NECTEC
Face recognition and deep learning  โดย ดร. สรรพฤทธิ์ มฤคทัต NECTECFace recognition and deep learning  โดย ดร. สรรพฤทธิ์ มฤคทัต NECTEC
Face recognition and deep learning โดย ดร. สรรพฤทธิ์ มฤคทัต NECTEC
BAINIDA
 
TypeScript and Deep Learning
TypeScript and Deep LearningTypeScript and Deep Learning
TypeScript and Deep Learning
Oswald Campesato
 
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
MLconf
 

Similar to [Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR STRUCTURED DATA (20)

Linear Regression.pptx
Linear Regression.pptxLinear Regression.pptx
Linear Regression.pptx
Ramakrishna Reddy Bijjam
 
Machine learning Introduction
Machine learning IntroductionMachine learning Introduction
Machine learning Introduction
Kuppusamy P
 
Data Mining Lecture_9.pptx
Data Mining Lecture_9.pptxData Mining Lecture_9.pptx
Data Mining Lecture_9.pptx
Subrata Kumer Paul
 
Lecture 5 - Linear Regression Linear Regression
Lecture 5 - Linear Regression Linear RegressionLecture 5 - Linear Regression Linear Regression
Lecture 5 - Linear Regression Linear Regression
viyah59114
 
Lecture 8 about data mining and how to use it.pptx
Lecture 8 about data mining and how to use it.pptxLecture 8 about data mining and how to use it.pptx
Lecture 8 about data mining and how to use it.pptx
HedraAtif
 
pca.pdf polymer nanoparticles and sensors
pca.pdf polymer nanoparticles and sensorspca.pdf polymer nanoparticles and sensors
pca.pdf polymer nanoparticles and sensors
vincyshamleyeben
 
Spectral Clustering Report
Spectral Clustering ReportSpectral Clustering Report
Spectral Clustering Report
Miaolan Xie
 
Data mining knowledge representation Notes
Data mining knowledge representation NotesData mining knowledge representation Notes
Data mining knowledge representation Notes
RevathiSundar4
 
Unit-1 Introduction and Mathematical Preliminaries.pptx
Unit-1 Introduction and Mathematical Preliminaries.pptxUnit-1 Introduction and Mathematical Preliminaries.pptx
Unit-1 Introduction and Mathematical Preliminaries.pptx
avinashBajpayee1
 
Fundamentals of Machine Learning.pptx
Fundamentals of Machine Learning.pptxFundamentals of Machine Learning.pptx
Fundamentals of Machine Learning.pptx
WiamFADEL
 
DimensionalityReduction.pptx
DimensionalityReduction.pptxDimensionalityReduction.pptx
DimensionalityReduction.pptx
36rajneekant
 
cnn.pptx Convolutional neural network used for image classication
cnn.pptx Convolutional neural network used for image classicationcnn.pptx Convolutional neural network used for image classication
cnn.pptx Convolutional neural network used for image classication
SakkaravarthiShanmug
 
Hashing in data base managementsystem_Updated.pptx
Hashing in data base managementsystem_Updated.pptxHashing in data base managementsystem_Updated.pptx
Hashing in data base managementsystem_Updated.pptx
yuvrajbhadoriya13
 
DataAnalysis in machine learning using different techniques
DataAnalysis in machine learning using different techniquesDataAnalysis in machine learning using different techniques
DataAnalysis in machine learning using different techniques
mtwnc202302
 
141222 graphulo ingraphblas
141222 graphulo ingraphblas141222 graphulo ingraphblas
141222 graphulo ingraphblas
MIT
 
141205 graphulo ingraphblas
141205 graphulo ingraphblas141205 graphulo ingraphblas
141205 graphulo ingraphblas
graphulo
 
02 MATLAB Polynomials.pptx using of matlab
02 MATLAB Polynomials.pptx using of matlab02 MATLAB Polynomials.pptx using of matlab
02 MATLAB Polynomials.pptx using of matlab
farzikaam26
 
Linear regression by Kodebay
Linear regression by KodebayLinear regression by Kodebay
Linear regression by Kodebay
Kodebay
 
Deep learning from mashine learning AI..
Deep learning from mashine learning AI..Deep learning from mashine learning AI..
Deep learning from mashine learning AI..
premkumarlive
 
feature matching and model fitting .pptx
feature matching and model fitting .pptxfeature matching and model fitting .pptx
feature matching and model fitting .pptx
demepa1337
 
Machine learning Introduction
Machine learning IntroductionMachine learning Introduction
Machine learning Introduction
Kuppusamy P
 
Lecture 5 - Linear Regression Linear Regression
Lecture 5 - Linear Regression Linear RegressionLecture 5 - Linear Regression Linear Regression
Lecture 5 - Linear Regression Linear Regression
viyah59114
 
Lecture 8 about data mining and how to use it.pptx
Lecture 8 about data mining and how to use it.pptxLecture 8 about data mining and how to use it.pptx
Lecture 8 about data mining and how to use it.pptx
HedraAtif
 
pca.pdf polymer nanoparticles and sensors
pca.pdf polymer nanoparticles and sensorspca.pdf polymer nanoparticles and sensors
pca.pdf polymer nanoparticles and sensors
vincyshamleyeben
 
Spectral Clustering Report
Spectral Clustering ReportSpectral Clustering Report
Spectral Clustering Report
Miaolan Xie
 
Data mining knowledge representation Notes
Data mining knowledge representation NotesData mining knowledge representation Notes
Data mining knowledge representation Notes
RevathiSundar4
 
Unit-1 Introduction and Mathematical Preliminaries.pptx
Unit-1 Introduction and Mathematical Preliminaries.pptxUnit-1 Introduction and Mathematical Preliminaries.pptx
Unit-1 Introduction and Mathematical Preliminaries.pptx
avinashBajpayee1
 
Fundamentals of Machine Learning.pptx
Fundamentals of Machine Learning.pptxFundamentals of Machine Learning.pptx
Fundamentals of Machine Learning.pptx
WiamFADEL
 
DimensionalityReduction.pptx
DimensionalityReduction.pptxDimensionalityReduction.pptx
DimensionalityReduction.pptx
36rajneekant
 
cnn.pptx Convolutional neural network used for image classication
cnn.pptx Convolutional neural network used for image classicationcnn.pptx Convolutional neural network used for image classication
cnn.pptx Convolutional neural network used for image classication
SakkaravarthiShanmug
 
Hashing in data base managementsystem_Updated.pptx
Hashing in data base managementsystem_Updated.pptxHashing in data base managementsystem_Updated.pptx
Hashing in data base managementsystem_Updated.pptx
yuvrajbhadoriya13
 
DataAnalysis in machine learning using different techniques
DataAnalysis in machine learning using different techniquesDataAnalysis in machine learning using different techniques
DataAnalysis in machine learning using different techniques
mtwnc202302
 
141222 graphulo ingraphblas
141222 graphulo ingraphblas141222 graphulo ingraphblas
141222 graphulo ingraphblas
MIT
 
141205 graphulo ingraphblas
141205 graphulo ingraphblas141205 graphulo ingraphblas
141205 graphulo ingraphblas
graphulo
 
02 MATLAB Polynomials.pptx using of matlab
02 MATLAB Polynomials.pptx using of matlab02 MATLAB Polynomials.pptx using of matlab
02 MATLAB Polynomials.pptx using of matlab
farzikaam26
 
Linear regression by Kodebay
Linear regression by KodebayLinear regression by Kodebay
Linear regression by Kodebay
Kodebay
 
Deep learning from mashine learning AI..
Deep learning from mashine learning AI..Deep learning from mashine learning AI..
Deep learning from mashine learning AI..
premkumarlive
 
feature matching and model fitting .pptx
feature matching and model fitting .pptxfeature matching and model fitting .pptx
feature matching and model fitting .pptx
demepa1337
 
Ad

More from Daiki Tanaka (12)

[Paper Reading] Theoretical Analysis of Self-Training with Deep Networks on U...
[Paper Reading] Theoretical Analysis of Self-Training with Deep Networks on U...[Paper Reading] Theoretical Analysis of Self-Training with Deep Networks on U...
[Paper Reading] Theoretical Analysis of Self-Training with Deep Networks on U...
Daiki Tanaka
 
カーネル法:正定値カーネルの理論
カーネル法:正定値カーネルの理論カーネル法:正定値カーネルの理論
カーネル法:正定値カーネルの理論
Daiki Tanaka
 
[Paper Reading] Causal Bandits: Learning Good Interventions via Causal Inference
[Paper Reading] Causal Bandits: Learning Good Interventions via Causal Inference[Paper Reading] Causal Bandits: Learning Good Interventions via Causal Inference
[Paper Reading] Causal Bandits: Learning Good Interventions via Causal Inference
Daiki Tanaka
 
Selective inference
Selective inferenceSelective inference
Selective inference
Daiki Tanaka
 
Anomaly Detection with VAEGAN and Attention [JSAI2019 report]
Anomaly Detection with VAEGAN and Attention [JSAI2019 report]Anomaly Detection with VAEGAN and Attention [JSAI2019 report]
Anomaly Detection with VAEGAN and Attention [JSAI2019 report]
Daiki Tanaka
 
オンライン学習 : Online learning
オンライン学習 : Online learningオンライン学習 : Online learning
オンライン学習 : Online learning
Daiki Tanaka
 
Local Outlier Detection with Interpretation
Local Outlier Detection with InterpretationLocal Outlier Detection with Interpretation
Local Outlier Detection with Interpretation
Daiki Tanaka
 
Interpretability of machine learning
Interpretability of machine learningInterpretability of machine learning
Interpretability of machine learning
Daiki Tanaka
 
The Million Domain Challenge: Broadcast Email Prioritization by Cross-domain ...
The Million Domain Challenge: Broadcast Email Prioritization by Cross-domain ...The Million Domain Challenge: Broadcast Email Prioritization by Cross-domain ...
The Million Domain Challenge: Broadcast Email Prioritization by Cross-domain ...
Daiki Tanaka
 
The Limits of Popularity-Based Recommendations, and the Role of Social Ties
The Limits of Popularity-Based Recommendations, and the Role of Social TiesThe Limits of Popularity-Based Recommendations, and the Role of Social Ties
The Limits of Popularity-Based Recommendations, and the Role of Social Ties
Daiki Tanaka
 
Learning Deep Representation from Big and Heterogeneous Data for Traffic Acci...
Learning Deep Representation from Big and Heterogeneous Data for Traffic Acci...Learning Deep Representation from Big and Heterogeneous Data for Traffic Acci...
Learning Deep Representation from Big and Heterogeneous Data for Traffic Acci...
Daiki Tanaka
 
Toeplitz Inverse Covariance-Based Clustering of Multivariate Time Series Data
Toeplitz Inverse Covariance-Based Clustering of Multivariate Time Series DataToeplitz Inverse Covariance-Based Clustering of Multivariate Time Series Data
Toeplitz Inverse Covariance-Based Clustering of Multivariate Time Series Data
Daiki Tanaka
 
[Paper Reading] Theoretical Analysis of Self-Training with Deep Networks on U...
[Paper Reading] Theoretical Analysis of Self-Training with Deep Networks on U...[Paper Reading] Theoretical Analysis of Self-Training with Deep Networks on U...
[Paper Reading] Theoretical Analysis of Self-Training with Deep Networks on U...
Daiki Tanaka
 
カーネル法:正定値カーネルの理論
カーネル法:正定値カーネルの理論カーネル法:正定値カーネルの理論
カーネル法:正定値カーネルの理論
Daiki Tanaka
 
[Paper Reading] Causal Bandits: Learning Good Interventions via Causal Inference
[Paper Reading] Causal Bandits: Learning Good Interventions via Causal Inference[Paper Reading] Causal Bandits: Learning Good Interventions via Causal Inference
[Paper Reading] Causal Bandits: Learning Good Interventions via Causal Inference
Daiki Tanaka
 
Selective inference
Selective inferenceSelective inference
Selective inference
Daiki Tanaka
 
Anomaly Detection with VAEGAN and Attention [JSAI2019 report]
Anomaly Detection with VAEGAN and Attention [JSAI2019 report]Anomaly Detection with VAEGAN and Attention [JSAI2019 report]
Anomaly Detection with VAEGAN and Attention [JSAI2019 report]
Daiki Tanaka
 
オンライン学習 : Online learning
オンライン学習 : Online learningオンライン学習 : Online learning
オンライン学習 : Online learning
Daiki Tanaka
 
Local Outlier Detection with Interpretation
Local Outlier Detection with InterpretationLocal Outlier Detection with Interpretation
Local Outlier Detection with Interpretation
Daiki Tanaka
 
Interpretability of machine learning
Interpretability of machine learningInterpretability of machine learning
Interpretability of machine learning
Daiki Tanaka
 
The Million Domain Challenge: Broadcast Email Prioritization by Cross-domain ...
The Million Domain Challenge: Broadcast Email Prioritization by Cross-domain ...The Million Domain Challenge: Broadcast Email Prioritization by Cross-domain ...
The Million Domain Challenge: Broadcast Email Prioritization by Cross-domain ...
Daiki Tanaka
 
The Limits of Popularity-Based Recommendations, and the Role of Social Ties
The Limits of Popularity-Based Recommendations, and the Role of Social TiesThe Limits of Popularity-Based Recommendations, and the Role of Social Ties
The Limits of Popularity-Based Recommendations, and the Role of Social Ties
Daiki Tanaka
 
Learning Deep Representation from Big and Heterogeneous Data for Traffic Acci...
Learning Deep Representation from Big and Heterogeneous Data for Traffic Acci...Learning Deep Representation from Big and Heterogeneous Data for Traffic Acci...
Learning Deep Representation from Big and Heterogeneous Data for Traffic Acci...
Daiki Tanaka
 
Toeplitz Inverse Covariance-Based Clustering of Multivariate Time Series Data
Toeplitz Inverse Covariance-Based Clustering of Multivariate Time Series DataToeplitz Inverse Covariance-Based Clustering of Multivariate Time Series Data
Toeplitz Inverse Covariance-Based Clustering of Multivariate Time Series Data
Daiki Tanaka
 
Ad

Recently uploaded (20)

Frontend Architecture Diagram/Guide For Frontend Engineers
Frontend Architecture Diagram/Guide For Frontend EngineersFrontend Architecture Diagram/Guide For Frontend Engineers
Frontend Architecture Diagram/Guide For Frontend Engineers
Michael Hertzberg
 
Construction-Chemicals-For-Waterproofing.ppt
Construction-Chemicals-For-Waterproofing.pptConstruction-Chemicals-For-Waterproofing.ppt
Construction-Chemicals-For-Waterproofing.ppt
ssuser2ffcbc
 
Applications of Centroid in Structural Engineering
Applications of Centroid in Structural EngineeringApplications of Centroid in Structural Engineering
Applications of Centroid in Structural Engineering
suvrojyotihalder2006
 
ML_Unit_VI_DEEP LEARNING_Introduction to ANN.pdf
ML_Unit_VI_DEEP LEARNING_Introduction to ANN.pdfML_Unit_VI_DEEP LEARNING_Introduction to ANN.pdf
ML_Unit_VI_DEEP LEARNING_Introduction to ANN.pdf
rameshwarchintamani
 
hypermedia_system_revisit_roy_fielding .
hypermedia_system_revisit_roy_fielding .hypermedia_system_revisit_roy_fielding .
hypermedia_system_revisit_roy_fielding .
NABLAS株式会社
 
IBAAS 2023 Series_Lecture 8- Dr. Nandi.pdf
IBAAS 2023 Series_Lecture 8- Dr. Nandi.pdfIBAAS 2023 Series_Lecture 8- Dr. Nandi.pdf
IBAAS 2023 Series_Lecture 8- Dr. Nandi.pdf
VigneshPalaniappanM
 
Mode-Wise Corridor Level Travel-Time Estimation Using Machine Learning Models
Mode-Wise Corridor Level Travel-Time Estimation Using Machine Learning ModelsMode-Wise Corridor Level Travel-Time Estimation Using Machine Learning Models
Mode-Wise Corridor Level Travel-Time Estimation Using Machine Learning Models
Journal of Soft Computing in Civil Engineering
 
Agents chapter of Artificial intelligence
Agents chapter of Artificial intelligenceAgents chapter of Artificial intelligence
Agents chapter of Artificial intelligence
DebdeepMukherjee9
 
introduction technology technology tec.pptx
introduction technology technology tec.pptxintroduction technology technology tec.pptx
introduction technology technology tec.pptx
Iftikhar70
 
Using the Artificial Neural Network to Predict the Axial Strength and Strain ...
Using the Artificial Neural Network to Predict the Axial Strength and Strain ...Using the Artificial Neural Network to Predict the Axial Strength and Strain ...
Using the Artificial Neural Network to Predict the Axial Strength and Strain ...
Journal of Soft Computing in Civil Engineering
 
🚀 TDX Bengaluru 2025 Unwrapped: Key Highlights, Innovations & Trailblazer Tak...
🚀 TDX Bengaluru 2025 Unwrapped: Key Highlights, Innovations & Trailblazer Tak...🚀 TDX Bengaluru 2025 Unwrapped: Key Highlights, Innovations & Trailblazer Tak...
🚀 TDX Bengaluru 2025 Unwrapped: Key Highlights, Innovations & Trailblazer Tak...
SanjeetMishra29
 
ATAL 6 Days Online FDP Scheme Document 2025-26.pdf
ATAL 6 Days Online FDP Scheme Document 2025-26.pdfATAL 6 Days Online FDP Scheme Document 2025-26.pdf
ATAL 6 Days Online FDP Scheme Document 2025-26.pdf
ssuserda39791
 
David Boutry - Specializes In AWS, Microservices And Python
David Boutry - Specializes In AWS, Microservices And PythonDavid Boutry - Specializes In AWS, Microservices And Python
David Boutry - Specializes In AWS, Microservices And Python
David Boutry
 
Water Industry Process Automation & Control Monthly May 2025
Water Industry Process Automation & Control Monthly May 2025Water Industry Process Automation & Control Monthly May 2025
Water Industry Process Automation & Control Monthly May 2025
Water Industry Process Automation & Control
 
Environment .................................
Environment .................................Environment .................................
Environment .................................
shadyozq9
 
Working with USDOT UTCs: From Conception to Implementation
Working with USDOT UTCs: From Conception to ImplementationWorking with USDOT UTCs: From Conception to Implementation
Working with USDOT UTCs: From Conception to Implementation
Alabama Transportation Assistance Program
 
vtc2018fall_otfs_tutorial_presentation_1.pdf
vtc2018fall_otfs_tutorial_presentation_1.pdfvtc2018fall_otfs_tutorial_presentation_1.pdf
vtc2018fall_otfs_tutorial_presentation_1.pdf
RaghavaGD1
 
Optimizing Reinforced Concrete Cantilever Retaining Walls Using Gases Brownia...
Optimizing Reinforced Concrete Cantilever Retaining Walls Using Gases Brownia...Optimizing Reinforced Concrete Cantilever Retaining Walls Using Gases Brownia...
Optimizing Reinforced Concrete Cantilever Retaining Walls Using Gases Brownia...
Journal of Soft Computing in Civil Engineering
 
Machine foundation notes for civil engineering students
Machine foundation notes for civil engineering studentsMachine foundation notes for civil engineering students
Machine foundation notes for civil engineering students
DYPCET
 
Jacob Murphy Australia - Excels In Optimizing Software Applications
Jacob Murphy Australia - Excels In Optimizing Software ApplicationsJacob Murphy Australia - Excels In Optimizing Software Applications
Jacob Murphy Australia - Excels In Optimizing Software Applications
Jacob Murphy Australia
 
Frontend Architecture Diagram/Guide For Frontend Engineers
Frontend Architecture Diagram/Guide For Frontend EngineersFrontend Architecture Diagram/Guide For Frontend Engineers
Frontend Architecture Diagram/Guide For Frontend Engineers
Michael Hertzberg
 
Construction-Chemicals-For-Waterproofing.ppt
Construction-Chemicals-For-Waterproofing.pptConstruction-Chemicals-For-Waterproofing.ppt
Construction-Chemicals-For-Waterproofing.ppt
ssuser2ffcbc
 
Applications of Centroid in Structural Engineering
Applications of Centroid in Structural EngineeringApplications of Centroid in Structural Engineering
Applications of Centroid in Structural Engineering
suvrojyotihalder2006
 
ML_Unit_VI_DEEP LEARNING_Introduction to ANN.pdf
ML_Unit_VI_DEEP LEARNING_Introduction to ANN.pdfML_Unit_VI_DEEP LEARNING_Introduction to ANN.pdf
ML_Unit_VI_DEEP LEARNING_Introduction to ANN.pdf
rameshwarchintamani
 
hypermedia_system_revisit_roy_fielding .
hypermedia_system_revisit_roy_fielding .hypermedia_system_revisit_roy_fielding .
hypermedia_system_revisit_roy_fielding .
NABLAS株式会社
 
IBAAS 2023 Series_Lecture 8- Dr. Nandi.pdf
IBAAS 2023 Series_Lecture 8- Dr. Nandi.pdfIBAAS 2023 Series_Lecture 8- Dr. Nandi.pdf
IBAAS 2023 Series_Lecture 8- Dr. Nandi.pdf
VigneshPalaniappanM
 
Agents chapter of Artificial intelligence
Agents chapter of Artificial intelligenceAgents chapter of Artificial intelligence
Agents chapter of Artificial intelligence
DebdeepMukherjee9
 
introduction technology technology tec.pptx
introduction technology technology tec.pptxintroduction technology technology tec.pptx
introduction technology technology tec.pptx
Iftikhar70
 
🚀 TDX Bengaluru 2025 Unwrapped: Key Highlights, Innovations & Trailblazer Tak...
🚀 TDX Bengaluru 2025 Unwrapped: Key Highlights, Innovations & Trailblazer Tak...🚀 TDX Bengaluru 2025 Unwrapped: Key Highlights, Innovations & Trailblazer Tak...
🚀 TDX Bengaluru 2025 Unwrapped: Key Highlights, Innovations & Trailblazer Tak...
SanjeetMishra29
 
ATAL 6 Days Online FDP Scheme Document 2025-26.pdf
ATAL 6 Days Online FDP Scheme Document 2025-26.pdfATAL 6 Days Online FDP Scheme Document 2025-26.pdf
ATAL 6 Days Online FDP Scheme Document 2025-26.pdf
ssuserda39791
 
David Boutry - Specializes In AWS, Microservices And Python
David Boutry - Specializes In AWS, Microservices And PythonDavid Boutry - Specializes In AWS, Microservices And Python
David Boutry - Specializes In AWS, Microservices And Python
David Boutry
 
Environment .................................
Environment .................................Environment .................................
Environment .................................
shadyozq9
 
vtc2018fall_otfs_tutorial_presentation_1.pdf
vtc2018fall_otfs_tutorial_presentation_1.pdfvtc2018fall_otfs_tutorial_presentation_1.pdf
vtc2018fall_otfs_tutorial_presentation_1.pdf
RaghavaGD1
 
Machine foundation notes for civil engineering students
Machine foundation notes for civil engineering studentsMachine foundation notes for civil engineering students
Machine foundation notes for civil engineering students
DYPCET
 
Jacob Murphy Australia - Excels In Optimizing Software Applications
Jacob Murphy Australia - Excels In Optimizing Software ApplicationsJacob Murphy Australia - Excels In Optimizing Software Applications
Jacob Murphy Australia - Excels In Optimizing Software Applications
Jacob Murphy Australia
 

[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR STRUCTURED DATA

  • 1. L-SHAPLEY and C-SHAPLEY: Efficient INTERPRETATION FOR STRUCTURED DATA Kyoto University Daiki Tanaka
  • 2. Background • Although many black-box ML models, such as RandomForest, NN or kernel-methods, can produce highly accurate prediction, but such predictions are lack of interpretability. 1. Luck of interpretation is a crucial issue when black-box models are applied in areas such as medicine, financial markets, and criminal justice. 2. To be able to know the model’s reasoning is the good way to improve the model.
  • 3. Background : There are kinds of approaches for interpreting models. • Model-specific interpretation or Model-agnostic interpretation • Model-specific : making some assumptions to models. (e.g. methods based on attention weights, or gradient based method like smooth-grad, grad-CAM, …) • Model-agnostic : making no assumption to models. (e.g. LIME, or Shapley value) • Model-level interpretation or Instance-wise interpretation • Instance-wise : yielding feature importances for each input instance (e.g. saliency map). • Model-level : yielding feature importances for the whole model. (e.g. weights of Logistic Regression, or decision rules of decision tree) This study focuses on Model-Agnostic & Instance-wise interpretation.
  • 4. Problem Setting : Model-Agnostic & Instance-wise interpretation • Input : • An Instance • A predictive model • Output : • A vector of importance scores of the feature. • Indicating which features are the key for the model to make its prediction on that instance. Interpretation method Instance Model Importance scores Making no assumptions on model
  • 5. Related work : Shapley value • The Shapley value is an idea in the field of cooperative game theory. • This was originally proposed as a characterization of a fair distribution of a total profit from all the players. Person:A Person:B Person:C Profit 0 0 1 5 0 1 0 30 0 1 1 50 1 0 0 40 1 0 1 55 1 1 0 75 1 1 1 100 BA C
  • 6. Related work : Shapley value • The Shapley value of is defined as : • is the set including all players. (ex. ) • is the function which returns the “profit” by the set . • considers the contribution of the element . • means the number of ways for selecting -sized subsets. i ϕ(i) = 1 N ∑ S⊆N{i} 1 ( |N| − 1 |S| ) (v(S ∪ {i}) − v(S)) N N = {person A, person B, person C} v(S) S v(S ∪ {i}) − v(S) i ( |N| − 1 |S| ) |S|
  • 7. Related work : Example of Shapley value • The Shapley value of person A is defined as : ϕ(A) = 1 3 ∑ S⊆{A,B,C}{A} 1 ( |N| − 1 |S| ) (v(S ∪ {A}) − v(S)) = 1 3 ( 1 1 (100 − 50) + 1 2 (55 − 5) + 1 2 (75 − 30)) Person:A Person:B Person:C Profit 0 0 1 5 0 1 0 30 0 1 1 50 1 0 0 40 1 0 1 55 1 1 0 75 1 1 1 100
  • 8. Related work : Shapley value • The Shapley value can be applied to predictive models. • Each feature is seen as a player in the underlying game. • Issue : Each evaluation of the Shapley value requires exponential number of model evaluations. There are two kinds of approaches to deal with the problem. • Approach1 : Sampling based method • Randomly sampling feature subsets • Approach2 : Regression based method • Sampling feature subsets based on a weighted kernel, and carrying out a linear regression to estimate Shapley value
  • 9. Notation • Feature Vector : • Note that is the dimension of feature vectors. • Set of features : • Sub-vector of features : • Output variables : • Output vector of a model given an input vector : x ∈ 𝒳 ⊂ ℝd d S ⊂ {1,2,…, d} xs = {xj, j ∈ S} y ∈ 𝒴 x ℙm(Y|x)
  • 10. Preliminaries : Importance of a feature set • Here, Importance score of feature set is introduced as: • Where denotes the expectation over . • The more similar the prediction produced by to the prediction produced by , the higher becomes. S 𝔼m[ ⋅ |x] ℙm( ⋅ |x) xS x vx(S) vx(S) = 𝔼m[ log ℙm(Y|xs) ∣ x ]
  • 11. Preliminaries : Importance of a feature set • In many cases, class-specific importance is favored. • How important is a feature set S to the predicted class? • Here, following degenerate conditional distribution is introduced. • We can then define the importance of a subset S with respect to using the modified score, which is the expected log probability of the predicted class. ̂ℙm ̂ℙm(y|x) = { 1 if y ∈ arg maxy′ℙm(y′|x) 0 otherwise . vx(S) = ̂𝔼m[ log ℙm(Y|xs) ∣ x ]
  • 12. Preliminaries : measuring interaction between features • Consider quantifying the importance of a given -th feature for feature vector . • A naive way is to compute the importance of set : . • But it ignores interactions between features. • For example, when performing sentiment analysis on the following sentence. This movie is not heartwarming or entertaining. • Then we wish to quantify the the importance of feature “not”, which plays an important role in the sentence as being classified as negative. • But one would expect that , because “not” itself has neither negative nor positive sentiment. i x {i} Vx({i}) Vx({not}) ≈ 0
  • 13. Preliminaries : marginal contributions of a feature • It is essential to consider the interactions of a given feature with other features. • A natural way to assess how feature interacts with other features is to compute difference between the importance of all features in S, with and without . • This difference is called marginal contribution of to S, and given by : • To obtain a simple scaler measure for , we need to aggregate these marginal contributions over all subsets S including . • The Shapley value is one way to do so. i i i i i i mx(S, i) := vx(S) − vx(Si)
  • 14. Preliminaries : Shapley value • For k = 1,2,…,d, we let denote the set of k-sized feature subsets that contain feature . • The Shapley value is obtained by averaging the marginal contributions. • First over the set for a fixed k. • Then over all possible choices of set size k. Sk(i) i Sk(i) ϕx(i) = 1 d d ∑ k=1 1 ( d − 1 k − 1) ∑ S∈Sk(i) mx(S, i) Set of k-sized feature sets including feature i
  • 15. Challenge with computing Shapley value • The exact computation of the Shapley value leads to computational difficulties. • We need to calculate marginal contributions for subsets. • There are some sampling-based approaches to deal with the problem. • But such approaches suffer from high variance when the number of samples to be collected per instance is limited. 2d−1 ϕx(i) = 1 d d ∑ k=1 1 ( d − 1 k − 1) ∑ S∈Sk(i) mx(S, i) = ∑ S∋i,S⊆[d] 1 ( d − 1 |S| − 1) mx(S, i)
  • 16. Proposed method Relaxing computational difficulties of Shapley-value
  • 17. Key idea : features can be seen as nodes of a graph, and they have some relationship. • In many applications, features can be considered as nodes of a graph, and we can define distances between pairs of features based on the graph structure. • Features distant in the graph have weak interactions each other. • For example, an image is modeled with a grid graph. Pixels that are far apart may have little effect on each other in the computation of Shapley value. • Or, a text is represented as a line graph. This is A pen
  • 18. Proposed method : preliminary • We are given feature vector • Then we let denote a connected graph • Each feature is assigned with node . • Edges represent the interactions between features. • The graph induces a following distance function on . • For a given node , its k-neighborhood is the set : x ∈ ℝd G = (V, E) i i V × V i ∈ V 𝒩k(i) := {j ∈ V|dG(i, j) ≤ k} dG(l, m) = the number of edges in shortest path joining l to m . Gray area is an example of .𝒩2(i)
  • 19. Proposed method1 : Local-Shapley (L-Shapley) • Definition1: Given a model , a sample , and a feature , the L-Shapley estimate of order k on a graph is given by : ℙm x i G ̂ϕk x(i) = 1 | 𝒩k(i)| ∑ T∋i,T⊆Nk(i) 1 ( |Nk(i)| − 1 |T| − 1 ) mx(T, i) k-neighborhoods of i original Shapley value : ϕx(i) = ∑ S∋i,S⊆[d] 1 ( d − 1 |S| − 1) mx(S, i)
  • 20. Proposed method2 : Connected-Shapley (C-Shapley) • Definition2: Given a model , a sample , and a feature , the C-Shapley estimate of order k on a graph is given by : where denotes the set of all subsets of that contain node , and nodes that are connected in . ℙm x i G Ck(i) 𝒩k(i) i G ̂ϕk x(i) = ∑ U∈Ck(i) 2 (|U| + 2)(|U| + 1)|U| mx(U, i) original Shapley value : ϕx(i) = ∑ S∋i,S⊆[d] 1 ( d − 1 |S| − 1) mx(S, i)
  • 21. Examples • Left subset (blue and red) is summed over in L-Shapely, not in C-Shapley. • Right subset (blue and red) is summed over in both L- Shapley and C-Shapley.
  • 22. Properties : Error between L-Shapley value and true Shapley value is upper-bounded. • S is the subset of k-nearest features of i. • is the sub-vector having k-nearest features of i. • is the sub-vector having features not included by S. XU XV
  • 23. Properties : Error between C-Shapley value and true Shapley value is upper-bounded. • S is the subset of k-nearest features of i. • is the connected subset in S.U ∋ i
  • 25. Experiments : tasks and baselines • Tasks : image classification, and text classification • Baselines : model agnostic methods • KernelSHAP : regression based approximation of Shapley • SampleShapley : Random sampling based approximation of Shapley • LIME : model agnostic interpretation method linearly approximating the black-box function around the target feature.
  • 26. Experiments : evaluation method • Evaluation method : • The change in log-odds scores of the predicted class before and after masking the top features ranked by importance scores, where masked words are replaced by zero paddings. • Larger decreasing of log-odds means that importance scores by algorithm could correctly capture importance of features.
  • 27. Experiment(1/3) : Text classification • We study the performances on three neural models and three datasets: Dataset Task Method Accuracy IMDB-Review Sentiment classification Word-based CNN 90.1% AG news Category classification Character-Based CNN 90.09% Yahoo! Answers Category classification LSTM 70.84%
  • 28. Experiment(1/3) : result • On IMDB with Word-CNN, the simplest model among the three, L-Shapley achieves the best performance while LIME, KernelSHAP and C-Shapley achieve slightly worse performance. Better
  • 29. Experiment(2/3) : Image classification • Datasets : • A subset of MNIST : Only “3" and “8” are included. • A subset of CIFAR-10 : Only deers(鹿) and horses(馬) are included.
  • 30. Experiment(2/3) : Result • C-Shapley outperform other methods in both datasets. Better
  • 31. Experiment(2/3) : Examples for misclassified images • Above image is “3”, and below image is “8”. They are misclassified into “8”, and “3”, respectively. • The masked pixels are colored with red if activated, (white) and blue otherwise. • The result seems to show the “reasoning” of the classifier.
  • 32. Experiment(3/3) : Evaluation by human subjects (5 people) • They use Amazon Mechanical Turk to compare L-Shapley, C- Shapley and KernelSHAP on IMDB movie reviews (200 reviews). • Experimental purpose : • Are humans able to make a decision with top words alone? • Are humans unable to make a decision with top words masked? • They ask subjects to classify the sentiment of texts into five categories : strongly positive (+2), positive (+1), neutral (0), negative (-1), strongly negative (-2).
  • 33. Experiment(3/3) : Evaluation by human subjects (5 people) • Texts have three types : 1. raw reviews 2. top 10 words of each review ranked by L-Shapley, C-Shapley and KernelSHAP 3. reviews with top words being masked • Masked words are produced by the L-Shapley, C-Shapley, and KernelSHAP until the probability score of the correct class produced by the model is lower than 10%. • Around 14.6% of words in each review are masked for L-Shapley and C- Shapley, and 31.6% for KernelSHAP.
  • 34. Experiment(3/3) : Evaluation by human subjects (5 people) • Evaluation metrics : • Consistency (0 or 1) between true labels and labels from human subjects. • Standard deviation of scores on each reviews • This is as a measure of disagreement between humans. • The absolute value of the averaged scores. • This is as a measure of confidence of decision.
  • 35. Experiment(3/3) : result • Humans become more consistent and confident when they are presented with top words. On the other hand, when top words are masked, humans are easier to make mistakes and are less certain. • C-Shapley yields the highest performance in terms of consistency, agreement, and confidence. • L-Shapley harms the human judgement the most among the three algorithms.
  • 36. Conclusion • They have proposed two algorithms; L-Shapley and C- Shapley for instance-wise and model-agnostic interpretation. • They demonstrated the superior performance of these algorithms compared to other methods on black-box models in both text and image classification with both quantitative metrics and human evaluation.
  翻译: