SlideShare a Scribd company logo
Data Mining and Data
Warehousing
CSE-4107
Md. Manowarul Islam
Associate Professor, Dept. of CSE
Jagannath University
Md. Manowarul Islam, Dept. Of CSE, JnU
What is classification?
🞐 Classification is the task of learning a target
function f that maps attribute set x to one of the
predefined class labels y
🞐 The target function f is known as a classification
model
Md. Manowarul Islam, Dept. Of CSE, JnU
What is classification?
🞐 One of the attributes is
the class attribute
🞐 In this case: Cheat
🞐 Two class labels (or
classes): Yes (1), No (0)
categorical
categorical
continuous
class
Md. Manowarul Islam, Dept. Of CSE, JnU
🞐 Classification
■predicts categorical class labels (discrete or
nominal)
■classifies data (constructs a model) based on
the training set and the values (class labels) in
a classifying attribute and uses it in classifying
new data
🞐 Prediction
■models continuous-valued functions,
■predicts unknown or missing values
Classification vs. Prediction
Md. Manowarul Islam, Dept. Of CSE, JnU
🞐 Descriptive modeling: Explanatory tool to
distinguish between objects of different classes
(e.g., understand why people cheat on their
taxes)
🞐 Predictive modeling: Predict a class of a
previously unseen record
Classification vs. Prediction
Md. Manowarul Islam, Dept. Of CSE, JnU
Classification vs. Prediction
Md. Manowarul Islam, Dept. Of CSE, JnU
🞐 Credit approval
■ A bank wants to classify its customers based on whether
they are expected to pay back their approved loans
■ The history of past customers is used to train the
classifier
■ The classifier provides rules, which identify potentially
reliable future customers
■ Classification rule:
🞐 If age = “31...40” and income = high then credit_rating =
excellent
■ Future customers
🞐 Paul: age = 35, income = high excellent credit rating
⇒
🞐 John: age = 20, income = medium fair credit rating
⇒
Why Classification?
Md. Manowarul Islam, Dept. Of CSE, JnU
🞐 Model construction: describing a set of
predetermined classes
■Each tuple/sample is assumed to belong to a
predefined class, as determined by the class
label attribute
■The set of tuples used for model construction:
training set
■The model is represented as classification
rules, decision trees, or mathematical
formulae
Classification—A Two-Step Process
Md. Manowarul Islam, Dept. Of CSE, JnU
🞐 Model usage: for classifying future or unknown
objects
■Estimate accuracy of the model
🞐The known label of test samples is
compared with the classified result from the
model
🞐Accuracy rate is the percentage of test set
samples that are correctly classified by the
model
🞐Test set is independent of training set,
otherwise over-fitting will occur
Classification—A Two-Step Process
Md. Manowarul Islam, Dept. Of CSE, JnU
Training
Data
Classification
Algorithms
IF rank = ‘professor’
OR years > 6
THEN tenured = ‘yes’
Classifie
r
(Model)
Model Construction
Md. Manowarul Islam, Dept. Of CSE, JnU
Classifie
r
Testing
Data
Unseen
Data
(Jeff, Professor, 4)
Tenured?
Use the Model in Prediction
Md. Manowarul Islam, Dept. Of CSE, JnU
Illustrating Classification Task
Md. Manowarul Islam, Dept. Of CSE, JnU
Decision Tree Classification Task
Decision
Tree
Md. Manowarul Islam, Dept. Of CSE, JnU
Supervised vs. Unsupervised Learning
🞐 Supervised learning (classification)
■ Supervision: The training data (observations,
measurements, etc.) are accompanied by labels
indicating the class of the observations
■ New data is classified based on the training set
🞐 Unsupervised learning (clustering)
■ The class labels of training data is unknown
■ Given a set of measurements, observations, etc. with
the aim of establishing the existence of classes or
clusters in the data
Md. Manowarul Islam, Dept. Of CSE, JnU
🞐 Data cleaning
■ Preprocess data in order to reduce noise and handle
missing values
🞐 Relevance analysis (feature selection)
■ Remove the irrelevant or redundant attributes
🞐 Data transformation
■ Generalize and/or normalize data
🞐 numerical attribute income categorical
⇒
{low,medium,high}
🞐 normalize all numerical attributes to [0,1]
Classification and prediction : Data Preparation
Md. Manowarul Islam, Dept. Of CSE, JnU
🞐 Predictive accuracy
🞐 Speed
■ time to construct the model
■ time to use the model
🞐 Robustness
■ handling noise and missing values
🞐 Scalability
■ efficiency in disk-resident databases
🞐 Interpretability:
■ understanding and insight provided by the model
🞐 Goodness of rules (quality)
■ decision tree size
■ compactness of classification rules
Evaluating Classification Methods
Md. Manowarul Islam, Dept. Of CSE, JnU
Evaluation of classification models
🞐 Counts of test records that are correctly (or
incorrectly) predicted by the classification model
🞐 Confusion matrix
Class = 1 Class = 0
Class = 1 f11 f10
Class = 0 f01 f00
Predicted Class
Actual
Class
Md. Manowarul Islam, Dept. Of CSE, JnU
Classification Techniques
🞐Decision Tree based Methods
🞐Rule-based Methods
🞐Memory based reasoning
🞐Neural Networks
🞐Naïve Bayes and Bayesian Belief Networks
🞐Support Vector Machines
Md. Manowarul Islam, Dept. Of CSE, JnU
🞐Decision tree
■A flow-chart-like tree structure
■Internal node denotes a test on an attribute
■Branch represents an outcome of the test
■Leaf nodes represent class labels or class
distribution
Decision Trees
Md. Manowarul Islam, Dept. Of CSE, JnU
categorical
categorical
continuous
class
Refund
MarSt
TaxInc
YES
NO
NO
NO
Yes No
Married
Single,
Divorced
< 80K > 80K
Splitting Attributes
Training Data Model: Decision Tree
Test outcome
Class labels
Example of a Decision Tree
Md. Manowarul Islam, Dept. Of CSE, JnU
Another Example of Decision Tree
categorical
categorical
continuous
class
MarSt
Refund
TaxInc
YES
NO
NO
NO
Yes No
Married
Single,
Divorced
< 80K > 80K
There could be more than one tree that fits
the same data!
Md. Manowarul Islam, Dept. Of CSE, JnU
Apply Model to Test Data
Refund
MarSt
TaxInc
YES
NO
NO
NO
Yes No
Married
Single,
Divorced
< 80K > 80K
Test Data
Start from the root of tree.
Refund Marital
Status
Taxable
Income
Cheat
No Married 80K ?
Md. Manowarul Islam, Dept. Of CSE, JnU
Apply Model to Test Data
Refund
MarSt
TaxInc
YES
NO
NO
NO
Yes No
Married
Single,
Divorced
< 80K > 80K
Test Data
Refund Marital
Status
Taxable
Income
Cheat
No Married 80K ?
Md. Manowarul Islam, Dept. Of CSE, JnU
Apply Model to Test Data
Refund
MarSt
TaxInc
YES
NO
NO
NO
Yes No
Married
Single,
Divorced
< 80K > 80K
Test Data
Refund Marital
Status
Taxable
Income
Cheat
No Married 80K ?
Md. Manowarul Islam, Dept. Of CSE, JnU
Apply Model to Test Data
Refund
MarSt
TaxInc
YES
NO
NO
NO
Yes No
Married
Single,
Divorced
< 80K > 80K
Test Data
Refund Marital
Status
Taxable
Income
Cheat
No Married 80K ?
Md. Manowarul Islam, Dept. Of CSE, JnU
Apply Model to Test Data
Refund
MarSt
TaxInc
YES
NO
NO
NO
Yes No
Married
Single,
Divorced
< 80K > 80K
Test Data
Refund Marital
Status
Taxable
Income
Cheat
No Married 80K ?
Md. Manowarul Islam, Dept. Of CSE, JnU
Apply Model to Test Data
Refund
MarSt
TaxInc
YES
NO
NO
NO
Yes No
Married
Single,
Divorced
< 80K > 80K
Assign Cheat to “No”
Test Data
Refund Marital
Status
Taxable
Income
Cheat
No Married 80K ?
Md. Manowarul Islam, Dept. Of CSE, JnU
General Structure of Hunt’s Algorithm
🞐 Let Dt be the set of training records that
reach a node t
🞐 General Procedure:
■ If Dt contains records that belong the
same class yt, then t is a leaf node
labeled as yt
■ If Dt contains records with the same
attribute values, then t is a leaf node
labeled with the majority class yt
■ If Dt is an empty set, then t is a leaf
node labeled by the default class, yd
■ If Dt contains records that belong to
more than one class, use an attribute
test to split the data into smaller
subsets.
🞐 Recursively apply the procedure to each
subset.
Dt
?
Md. Manowarul Islam, Dept. Of CSE, JnU
Hunt’s Algorithm
Don’t Cheat
Md. Manowarul Islam, Dept. Of CSE, JnU
Hunt’s Algorithm
Don’t Cheat
Refun
d
Don’t Cheat Don’t Cheat
Yes No
Md. Manowarul Islam, Dept. Of CSE, JnU
Hunt’s Algorithm
Don’t Cheat
Refun
d
Don’t Cheat Don’t Cheat
Yes No
Refun
d
Don’t Cheat
Yes No
Marital
Status
Cheat
Single, Divorced
Marri
ed
Don’t Cheat
Md. Manowarul Islam, Dept. Of CSE, JnU
Hunt’s Algorithm
Don’t Cheat
Refun
d
Don’t Cheat Don’t Cheat
Yes No
Refun
d
Don’t Cheat
Yes No
Marital
Status
Cheat
Single, Divorced
Marri
ed
Don’t Cheat
<
80K
>=
80K
Taxable
Income
Refun
d
Don’t Cheat
Yes No
Marital
Status
Single, Divorced
Marri
ed
Don’t Cheat
Don’t Cheat Cheat
Md. Manowarul Islam, Dept. Of CSE, JnU
Tree Induction
🞐Finding the best decision tree is NP-hard
🞐Greedy strategy.
■Split the records based on an attribute test
that optimizes certain criterion.
🞐Many Algorithms:
■Hunt’s Algorithm (one of the earliest)
■CART
■ID3, C4.5
■SLIQ,SPRINT
Md. Manowarul Islam, Dept. Of CSE, JnU
Classification by Decision Tree Induction
🞐 Decision tree
■ A flow-chart-like tree structure
■ Internal node denotes a test on an attribute
■ Branch represents an outcome of the test
■ Leaf nodes represent class labels or class distribution
🞐 Decision tree generation consists of two phases
■ Tree construction
🞐 At start, all the training examples are at the root
🞐 Partition examples recursively based on selected attributes
■ Tree pruning
🞐 Identify and remove branches that reflect noise or outliers
🞐 Use of decision tree: Classifying an unknown sample
■ Test the attribute values of the sample against the decision
tree
Md. Manowarul Islam, Dept. Of CSE, JnU
Training Dataset
Md. Manowarul Islam, Dept. Of CSE, JnU
Output: A Decision Tree for
“buys_computer”
age?
overcas
t
student? credit rating?
n
o
ye
s
fai
r
excellen
t
<=30 >40
n
o
n
o
ye
s
ye
s
ye
s
30..40
Md. Manowarul Islam, Dept. Of CSE, JnU
Algorithm for Decision Tree Induction
🞐 Basic algorithm (a greedy algorithm)
■ Tree is constructed in a top-down recursive divide-and-conquer
manner
■ At start, all the training examples are at the root
■ Attributes are categorical (if continuous-valued, they are
discretized in advance)
■ Samples are partitioned recursively based on selected attributes
■ Test attributes are selected on the basis of a heuristic or
statistical measure (e.g., information gain)
🞐 Conditions for stopping partitioning
■ All samples for a given node belong to the same class
■ There are no remaining attributes for further partitioning –
majority voting is employed for classifying the leaf
■ There are no samples left
Md. Manowarul Islam, Dept. Of CSE, JnU
Attribute Selection Measure:
🞐 Information Gain (ID3/C4.5)
🞐 Select the attribute with the highest information gain
age
?
overcas
t
student
?
credit
rating?
n
o
ye
s
fai
r
excellen
t
<=3
0
>4
0
n
o
n
o
ye
s
ye
s
ye
s
30..40
Md. Manowarul Islam, Dept. Of CSE, JnU
Attribute Selection Measure:
🞐 Let D, the data partition, be a training set of
class-labeled tuples.
🞐 m distinct classes, Ci (for i = 1,…,m).
🞐 Ci, D be the set of tuples in D belongs to class Ci
🞐 |Ci, D| and |D| number of tuples in Ci, D and D
Md. Manowarul Islam, Dept. Of CSE, JnU
Attribute Selection Measure:
🞐Let pi be the probability that an arbitrary tuple
in D belongs to class Ci, estimated by
■ pi = |Ci, D|/|D|
🞐Expected information (entropy) needed to
classify a tuple in D:
Training Dataset
🞐 The class label attribute, buys
Computer
■ Two distinct values (yes, no);
🞐 There are two distinct classes
(that is, m = 2).
🞐 Let class C1 correspond to yes
and class C2 correspond to no.
🞐 There are nine tuples of class
yes and five tuples of class no.
g Class C1: buys_computer = “yes”
g Class C2: buys_computer = “no”
Attribute Selection: Information Gain
■ Suppose we want to partition the tuples in D on some
attribute A having v distinct values , {a1, a2, … , av}
■ Attribute A can be used to split D into v partitions or
subsets, {D1, D2, … , Dv},
■ Where Dj contains those tuples in D that have
outcome aj of A.
■ Information needed (after using A to split D into v
partitions) to classify D:
■ Information gained by branching on attribute A
Attribute Selection: Information Gain
g Class C1: buys_computer = “yes”
g Class C2: buys_computer = “no”
Age Tuple C1(Y) C2(N)
<=30 5(14) 2 3
31…40 4(14) 4 0
>40 5(14) 3 2
Attribute Selection: Information Gain
Age Tuple C1(Y) C2(N)
<=30 5(14) 2 3
31…40 4(14) 4 0
>40 5(14) 3 2
Attribute Selection: Information Gain
Md. Manowarul Islam, Dept. Of CSE, JnU
Attribute Selection: Information Gain
Splitting the samples using age
age?
<=3
0
30...4
0
>4
0
labeled
yes
Md. Manowarul Islam, Dept. Of CSE, JnU
Output: A Decision Tree for
“buys_computer”
age?
overcas
t
student? credit rating?
n
o
ye
s
fai
r
excellen
t
<=30 >40
n
o
n
o
ye
s
ye
s
ye
s
30..40
Md. Manowarul Islam, Dept. Of CSE, JnU
Gain Ratio for Attribute Selection (C4.5)
🞐 The information gain measure is biased toward
tests with many outcomes
🞐 consider an attribute that acts as a unique
identifier, such as product_ID.
🞐 split on product_ID would result in a large
number of partitions
🞐 Infoproduct_ID(D) = 0.
🞐 Information gained by partitioning on this
attribute is maximal.
🞐 Such a partitioning is useless for classification.
Md. Manowarul Islam, Dept. Of CSE, JnU
Gain Ratio for Attribute Selection (C4.5)
🞐 Information gain measure is biased towards
attributes with a large number of values
🞐 C4.5 (a successor of ID3) uses gain ratio to
overcome the problem (normalization to
information gain)
Income Tuple
low 4(14)
medium 6(14)
high 4(14)
Gain Ratio for Attribute Selection (C4.5)
Md. Manowarul Islam, Dept. Of CSE, JnU
🞐 Ex. gain_ratio(income) = 0.029/0.926 = 0.031
🞐 The attribute with the maximum gain ratio is
selected as the splitting attribute
Income Tuple
low 4(14)
medium 6(14)
high 4(14)
Gain Ratio for Attribute Selection (C4.5)
Md. Manowarul Islam, Dept. Of CSE, JnU
Thank you
Ad

More Related Content

Similar to Lecture_21_22_Classification_Instance-based Learning (20)

Cross-validation Tutorial: What, how and which?
Cross-validation Tutorial: What, how and which?Cross-validation Tutorial: What, how and which?
Cross-validation Tutorial: What, how and which?
Pradeep Redddy Raamana
 
Research trends in data warehousing and data mining
Research trends in data warehousing and data miningResearch trends in data warehousing and data mining
Research trends in data warehousing and data mining
Er. Nawaraj Bhandari
 
classification in data mining and data warehousing.pdf
classification in data mining and data warehousing.pdfclassification in data mining and data warehousing.pdf
classification in data mining and data warehousing.pdf
321106410027
 
dataminingclassificationprediction123 .pptx
dataminingclassificationprediction123 .pptxdataminingclassificationprediction123 .pptx
dataminingclassificationprediction123 .pptx
AsrithaKorupolu
 
Big Data Analytics - Unit 3.pptx
Big Data Analytics - Unit 3.pptxBig Data Analytics - Unit 3.pptx
Big Data Analytics - Unit 3.pptx
PlacementsBCA
 
Predictive Model Selection in PLS-PM (SCECR 2015)
Predictive Model Selection in PLS-PM (SCECR 2015)Predictive Model Selection in PLS-PM (SCECR 2015)
Predictive Model Selection in PLS-PM (SCECR 2015)
Galit Shmueli
 
Analyzing Road Side Breath Test Data with WEKA
Analyzing Road Side Breath Test Data with WEKAAnalyzing Road Side Breath Test Data with WEKA
Analyzing Road Side Breath Test Data with WEKA
Yogesh Shinde
 
Classification
ClassificationClassification
Classification
Amit Kumar Rathi
 
Week 11 Model Evalaution Model Evaluation
Week 11 Model Evalaution Model EvaluationWeek 11 Model Evalaution Model Evaluation
Week 11 Model Evalaution Model Evaluation
khairulhuda242
 
Analytics Types.pdfdvf ifbvuibugdfiubuibubufdibhdfiubfduibhfiuvdih
Analytics Types.pdfdvf ifbvuibugdfiubuibubufdibhdfiubfduibhfiuvdihAnalytics Types.pdfdvf ifbvuibugdfiubuibubufdibhdfiubfduibhfiuvdih
Analytics Types.pdfdvf ifbvuibugdfiubuibubufdibhdfiubfduibhfiuvdih
NarishaBhawsar
 
Lect9 Decision tree
Lect9 Decision treeLect9 Decision tree
Lect9 Decision tree
hktripathy
 
machine learning types methods classification regression decision tree
machine learning types methods classification regression decision treemachine learning types methods classification regression decision tree
machine learning types methods classification regression decision tree
drmohamadaboutaam
 
Supervised Learning-Unit 3.pptx
Supervised Learning-Unit 3.pptxSupervised Learning-Unit 3.pptx
Supervised Learning-Unit 3.pptx
nehashanbhag5
 
data minig
data minig data minig
data minig
مسفر قمشة
 
5_Model for Predictions_Machine_Learning.ppt
5_Model for Predictions_Machine_Learning.ppt5_Model for Predictions_Machine_Learning.ppt
5_Model for Predictions_Machine_Learning.ppt
VGaneshKarthikeyan
 
Classification and prediction
Classification and predictionClassification and prediction
Classification and prediction
Acad
 
Data Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationData Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health Classification
Boston Institute of Analytics
 
Classification techniques in data mining
Classification techniques in data miningClassification techniques in data mining
Classification techniques in data mining
Kamal Acharya
 
Data mining chapter04and5-best
Data mining chapter04and5-bestData mining chapter04and5-best
Data mining chapter04and5-best
ABDUmomo
 
Presentation on supervised learning
Presentation on supervised learningPresentation on supervised learning
Presentation on supervised learning
Tonmoy Bhagawati
 
Cross-validation Tutorial: What, how and which?
Cross-validation Tutorial: What, how and which?Cross-validation Tutorial: What, how and which?
Cross-validation Tutorial: What, how and which?
Pradeep Redddy Raamana
 
Research trends in data warehousing and data mining
Research trends in data warehousing and data miningResearch trends in data warehousing and data mining
Research trends in data warehousing and data mining
Er. Nawaraj Bhandari
 
classification in data mining and data warehousing.pdf
classification in data mining and data warehousing.pdfclassification in data mining and data warehousing.pdf
classification in data mining and data warehousing.pdf
321106410027
 
dataminingclassificationprediction123 .pptx
dataminingclassificationprediction123 .pptxdataminingclassificationprediction123 .pptx
dataminingclassificationprediction123 .pptx
AsrithaKorupolu
 
Big Data Analytics - Unit 3.pptx
Big Data Analytics - Unit 3.pptxBig Data Analytics - Unit 3.pptx
Big Data Analytics - Unit 3.pptx
PlacementsBCA
 
Predictive Model Selection in PLS-PM (SCECR 2015)
Predictive Model Selection in PLS-PM (SCECR 2015)Predictive Model Selection in PLS-PM (SCECR 2015)
Predictive Model Selection in PLS-PM (SCECR 2015)
Galit Shmueli
 
Analyzing Road Side Breath Test Data with WEKA
Analyzing Road Side Breath Test Data with WEKAAnalyzing Road Side Breath Test Data with WEKA
Analyzing Road Side Breath Test Data with WEKA
Yogesh Shinde
 
Week 11 Model Evalaution Model Evaluation
Week 11 Model Evalaution Model EvaluationWeek 11 Model Evalaution Model Evaluation
Week 11 Model Evalaution Model Evaluation
khairulhuda242
 
Analytics Types.pdfdvf ifbvuibugdfiubuibubufdibhdfiubfduibhfiuvdih
Analytics Types.pdfdvf ifbvuibugdfiubuibubufdibhdfiubfduibhfiuvdihAnalytics Types.pdfdvf ifbvuibugdfiubuibubufdibhdfiubfduibhfiuvdih
Analytics Types.pdfdvf ifbvuibugdfiubuibubufdibhdfiubfduibhfiuvdih
NarishaBhawsar
 
Lect9 Decision tree
Lect9 Decision treeLect9 Decision tree
Lect9 Decision tree
hktripathy
 
machine learning types methods classification regression decision tree
machine learning types methods classification regression decision treemachine learning types methods classification regression decision tree
machine learning types methods classification regression decision tree
drmohamadaboutaam
 
Supervised Learning-Unit 3.pptx
Supervised Learning-Unit 3.pptxSupervised Learning-Unit 3.pptx
Supervised Learning-Unit 3.pptx
nehashanbhag5
 
5_Model for Predictions_Machine_Learning.ppt
5_Model for Predictions_Machine_Learning.ppt5_Model for Predictions_Machine_Learning.ppt
5_Model for Predictions_Machine_Learning.ppt
VGaneshKarthikeyan
 
Classification and prediction
Classification and predictionClassification and prediction
Classification and prediction
Acad
 
Data Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationData Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health Classification
Boston Institute of Analytics
 
Classification techniques in data mining
Classification techniques in data miningClassification techniques in data mining
Classification techniques in data mining
Kamal Acharya
 
Data mining chapter04and5-best
Data mining chapter04and5-bestData mining chapter04and5-best
Data mining chapter04and5-best
ABDUmomo
 
Presentation on supervised learning
Presentation on supervised learningPresentation on supervised learning
Presentation on supervised learning
Tonmoy Bhagawati
 

Recently uploaded (20)

Little Known Ways To 3 Best sites to Buy Linkedin Accounts.pdf
Little Known Ways To 3 Best sites to Buy Linkedin Accounts.pdfLittle Known Ways To 3 Best sites to Buy Linkedin Accounts.pdf
Little Known Ways To 3 Best sites to Buy Linkedin Accounts.pdf
gori42199
 
GROUP 2 - MANUFACTURE OF LIME, GYPSUM AND CEMENT.pdf
GROUP 2 - MANUFACTURE OF LIME, GYPSUM AND CEMENT.pdfGROUP 2 - MANUFACTURE OF LIME, GYPSUM AND CEMENT.pdf
GROUP 2 - MANUFACTURE OF LIME, GYPSUM AND CEMENT.pdf
kemimafe11
 
AI Chatbots & Software Development Teams
AI Chatbots & Software Development TeamsAI Chatbots & Software Development Teams
AI Chatbots & Software Development Teams
Joe Krall
 
Water Industry Process Automation & Control Monthly May 2025
Water Industry Process Automation & Control Monthly May 2025Water Industry Process Automation & Control Monthly May 2025
Water Industry Process Automation & Control Monthly May 2025
Water Industry Process Automation & Control
 
Transport modelling at SBB, presentation at EPFL in 2025
Transport modelling at SBB, presentation at EPFL in 2025Transport modelling at SBB, presentation at EPFL in 2025
Transport modelling at SBB, presentation at EPFL in 2025
Antonin Danalet
 
vtc2018fall_otfs_tutorial_presentation_1.pdf
vtc2018fall_otfs_tutorial_presentation_1.pdfvtc2018fall_otfs_tutorial_presentation_1.pdf
vtc2018fall_otfs_tutorial_presentation_1.pdf
RaghavaGD1
 
Frontend Architecture Diagram/Guide For Frontend Engineers
Frontend Architecture Diagram/Guide For Frontend EngineersFrontend Architecture Diagram/Guide For Frontend Engineers
Frontend Architecture Diagram/Guide For Frontend Engineers
Michael Hertzberg
 
Working with USDOT UTCs: From Conception to Implementation
Working with USDOT UTCs: From Conception to ImplementationWorking with USDOT UTCs: From Conception to Implementation
Working with USDOT UTCs: From Conception to Implementation
Alabama Transportation Assistance Program
 
IBAAS 2023 Series_Lecture 8- Dr. Nandi.pdf
IBAAS 2023 Series_Lecture 8- Dr. Nandi.pdfIBAAS 2023 Series_Lecture 8- Dr. Nandi.pdf
IBAAS 2023 Series_Lecture 8- Dr. Nandi.pdf
VigneshPalaniappanM
 
DeFAIMint | 🤖Mint to DeFAI. Vibe Trading as NFT
DeFAIMint | 🤖Mint to DeFAI. Vibe Trading as NFTDeFAIMint | 🤖Mint to DeFAI. Vibe Trading as NFT
DeFAIMint | 🤖Mint to DeFAI. Vibe Trading as NFT
Kyohei Ito
 
Urban Transport Infrastructure September 2023
Urban Transport Infrastructure September 2023Urban Transport Infrastructure September 2023
Urban Transport Infrastructure September 2023
Rajesh Prasad
 
Dahua Smart Cityyyyyyyyyyyyyyyyyy2025.pdf
Dahua Smart Cityyyyyyyyyyyyyyyyyy2025.pdfDahua Smart Cityyyyyyyyyyyyyyyyyy2025.pdf
Dahua Smart Cityyyyyyyyyyyyyyyyyy2025.pdf
PawachMetharattanara
 
Lecture - 7 Canals of the topic of the civil engineering
Lecture - 7  Canals of the topic of the civil engineeringLecture - 7  Canals of the topic of the civil engineering
Lecture - 7 Canals of the topic of the civil engineering
MJawadkhan1
 
David Boutry - Specializes In AWS, Microservices And Python
David Boutry - Specializes In AWS, Microservices And PythonDavid Boutry - Specializes In AWS, Microservices And Python
David Boutry - Specializes In AWS, Microservices And Python
David Boutry
 
Unleashing the Power of Salesforce Flows &amp_ Slack Integration!.pptx
Unleashing the Power of Salesforce Flows &amp_ Slack Integration!.pptxUnleashing the Power of Salesforce Flows &amp_ Slack Integration!.pptx
Unleashing the Power of Salesforce Flows &amp_ Slack Integration!.pptx
SanjeetMishra29
 
[PyCon US 2025] Scaling the Mountain_ A Framework for Tackling Large-Scale Te...
[PyCon US 2025] Scaling the Mountain_ A Framework for Tackling Large-Scale Te...[PyCon US 2025] Scaling the Mountain_ A Framework for Tackling Large-Scale Te...
[PyCon US 2025] Scaling the Mountain_ A Framework for Tackling Large-Scale Te...
Jimmy Lai
 
Deepfake Phishing: A New Frontier in Cyber Threats
Deepfake Phishing: A New Frontier in Cyber ThreatsDeepfake Phishing: A New Frontier in Cyber Threats
Deepfake Phishing: A New Frontier in Cyber Threats
RaviKumar256934
 
22PCOAM16 ML Unit 3 Full notes PDF & QB.pdf
22PCOAM16 ML Unit 3 Full notes PDF & QB.pdf22PCOAM16 ML Unit 3 Full notes PDF & QB.pdf
22PCOAM16 ML Unit 3 Full notes PDF & QB.pdf
Guru Nanak Technical Institutions
 
acid base ppt and their specific application in food
acid base ppt and their specific application in foodacid base ppt and their specific application in food
acid base ppt and their specific application in food
Fatehatun Noor
 
UNIT 3 Software Engineering (BCS601) EIOV.pdf
UNIT 3 Software Engineering (BCS601) EIOV.pdfUNIT 3 Software Engineering (BCS601) EIOV.pdf
UNIT 3 Software Engineering (BCS601) EIOV.pdf
sikarwaramit089
 
Little Known Ways To 3 Best sites to Buy Linkedin Accounts.pdf
Little Known Ways To 3 Best sites to Buy Linkedin Accounts.pdfLittle Known Ways To 3 Best sites to Buy Linkedin Accounts.pdf
Little Known Ways To 3 Best sites to Buy Linkedin Accounts.pdf
gori42199
 
GROUP 2 - MANUFACTURE OF LIME, GYPSUM AND CEMENT.pdf
GROUP 2 - MANUFACTURE OF LIME, GYPSUM AND CEMENT.pdfGROUP 2 - MANUFACTURE OF LIME, GYPSUM AND CEMENT.pdf
GROUP 2 - MANUFACTURE OF LIME, GYPSUM AND CEMENT.pdf
kemimafe11
 
AI Chatbots & Software Development Teams
AI Chatbots & Software Development TeamsAI Chatbots & Software Development Teams
AI Chatbots & Software Development Teams
Joe Krall
 
Transport modelling at SBB, presentation at EPFL in 2025
Transport modelling at SBB, presentation at EPFL in 2025Transport modelling at SBB, presentation at EPFL in 2025
Transport modelling at SBB, presentation at EPFL in 2025
Antonin Danalet
 
vtc2018fall_otfs_tutorial_presentation_1.pdf
vtc2018fall_otfs_tutorial_presentation_1.pdfvtc2018fall_otfs_tutorial_presentation_1.pdf
vtc2018fall_otfs_tutorial_presentation_1.pdf
RaghavaGD1
 
Frontend Architecture Diagram/Guide For Frontend Engineers
Frontend Architecture Diagram/Guide For Frontend EngineersFrontend Architecture Diagram/Guide For Frontend Engineers
Frontend Architecture Diagram/Guide For Frontend Engineers
Michael Hertzberg
 
IBAAS 2023 Series_Lecture 8- Dr. Nandi.pdf
IBAAS 2023 Series_Lecture 8- Dr. Nandi.pdfIBAAS 2023 Series_Lecture 8- Dr. Nandi.pdf
IBAAS 2023 Series_Lecture 8- Dr. Nandi.pdf
VigneshPalaniappanM
 
DeFAIMint | 🤖Mint to DeFAI. Vibe Trading as NFT
DeFAIMint | 🤖Mint to DeFAI. Vibe Trading as NFTDeFAIMint | 🤖Mint to DeFAI. Vibe Trading as NFT
DeFAIMint | 🤖Mint to DeFAI. Vibe Trading as NFT
Kyohei Ito
 
Urban Transport Infrastructure September 2023
Urban Transport Infrastructure September 2023Urban Transport Infrastructure September 2023
Urban Transport Infrastructure September 2023
Rajesh Prasad
 
Dahua Smart Cityyyyyyyyyyyyyyyyyy2025.pdf
Dahua Smart Cityyyyyyyyyyyyyyyyyy2025.pdfDahua Smart Cityyyyyyyyyyyyyyyyyy2025.pdf
Dahua Smart Cityyyyyyyyyyyyyyyyyy2025.pdf
PawachMetharattanara
 
Lecture - 7 Canals of the topic of the civil engineering
Lecture - 7  Canals of the topic of the civil engineeringLecture - 7  Canals of the topic of the civil engineering
Lecture - 7 Canals of the topic of the civil engineering
MJawadkhan1
 
David Boutry - Specializes In AWS, Microservices And Python
David Boutry - Specializes In AWS, Microservices And PythonDavid Boutry - Specializes In AWS, Microservices And Python
David Boutry - Specializes In AWS, Microservices And Python
David Boutry
 
Unleashing the Power of Salesforce Flows &amp_ Slack Integration!.pptx
Unleashing the Power of Salesforce Flows &amp_ Slack Integration!.pptxUnleashing the Power of Salesforce Flows &amp_ Slack Integration!.pptx
Unleashing the Power of Salesforce Flows &amp_ Slack Integration!.pptx
SanjeetMishra29
 
[PyCon US 2025] Scaling the Mountain_ A Framework for Tackling Large-Scale Te...
[PyCon US 2025] Scaling the Mountain_ A Framework for Tackling Large-Scale Te...[PyCon US 2025] Scaling the Mountain_ A Framework for Tackling Large-Scale Te...
[PyCon US 2025] Scaling the Mountain_ A Framework for Tackling Large-Scale Te...
Jimmy Lai
 
Deepfake Phishing: A New Frontier in Cyber Threats
Deepfake Phishing: A New Frontier in Cyber ThreatsDeepfake Phishing: A New Frontier in Cyber Threats
Deepfake Phishing: A New Frontier in Cyber Threats
RaviKumar256934
 
acid base ppt and their specific application in food
acid base ppt and their specific application in foodacid base ppt and their specific application in food
acid base ppt and their specific application in food
Fatehatun Noor
 
UNIT 3 Software Engineering (BCS601) EIOV.pdf
UNIT 3 Software Engineering (BCS601) EIOV.pdfUNIT 3 Software Engineering (BCS601) EIOV.pdf
UNIT 3 Software Engineering (BCS601) EIOV.pdf
sikarwaramit089
 
Ad

Lecture_21_22_Classification_Instance-based Learning

  • 1. Data Mining and Data Warehousing CSE-4107 Md. Manowarul Islam Associate Professor, Dept. of CSE Jagannath University
  • 2. Md. Manowarul Islam, Dept. Of CSE, JnU What is classification? 🞐 Classification is the task of learning a target function f that maps attribute set x to one of the predefined class labels y 🞐 The target function f is known as a classification model
  • 3. Md. Manowarul Islam, Dept. Of CSE, JnU What is classification? 🞐 One of the attributes is the class attribute 🞐 In this case: Cheat 🞐 Two class labels (or classes): Yes (1), No (0) categorical categorical continuous class
  • 4. Md. Manowarul Islam, Dept. Of CSE, JnU 🞐 Classification ■predicts categorical class labels (discrete or nominal) ■classifies data (constructs a model) based on the training set and the values (class labels) in a classifying attribute and uses it in classifying new data 🞐 Prediction ■models continuous-valued functions, ■predicts unknown or missing values Classification vs. Prediction
  • 5. Md. Manowarul Islam, Dept. Of CSE, JnU 🞐 Descriptive modeling: Explanatory tool to distinguish between objects of different classes (e.g., understand why people cheat on their taxes) 🞐 Predictive modeling: Predict a class of a previously unseen record Classification vs. Prediction
  • 6. Md. Manowarul Islam, Dept. Of CSE, JnU Classification vs. Prediction
  • 7. Md. Manowarul Islam, Dept. Of CSE, JnU 🞐 Credit approval ■ A bank wants to classify its customers based on whether they are expected to pay back their approved loans ■ The history of past customers is used to train the classifier ■ The classifier provides rules, which identify potentially reliable future customers ■ Classification rule: 🞐 If age = “31...40” and income = high then credit_rating = excellent ■ Future customers 🞐 Paul: age = 35, income = high excellent credit rating ⇒ 🞐 John: age = 20, income = medium fair credit rating ⇒ Why Classification?
  • 8. Md. Manowarul Islam, Dept. Of CSE, JnU 🞐 Model construction: describing a set of predetermined classes ■Each tuple/sample is assumed to belong to a predefined class, as determined by the class label attribute ■The set of tuples used for model construction: training set ■The model is represented as classification rules, decision trees, or mathematical formulae Classification—A Two-Step Process
  • 9. Md. Manowarul Islam, Dept. Of CSE, JnU 🞐 Model usage: for classifying future or unknown objects ■Estimate accuracy of the model 🞐The known label of test samples is compared with the classified result from the model 🞐Accuracy rate is the percentage of test set samples that are correctly classified by the model 🞐Test set is independent of training set, otherwise over-fitting will occur Classification—A Two-Step Process
  • 10. Md. Manowarul Islam, Dept. Of CSE, JnU Training Data Classification Algorithms IF rank = ‘professor’ OR years > 6 THEN tenured = ‘yes’ Classifie r (Model) Model Construction
  • 11. Md. Manowarul Islam, Dept. Of CSE, JnU Classifie r Testing Data Unseen Data (Jeff, Professor, 4) Tenured? Use the Model in Prediction
  • 12. Md. Manowarul Islam, Dept. Of CSE, JnU Illustrating Classification Task
  • 13. Md. Manowarul Islam, Dept. Of CSE, JnU Decision Tree Classification Task Decision Tree
  • 14. Md. Manowarul Islam, Dept. Of CSE, JnU Supervised vs. Unsupervised Learning 🞐 Supervised learning (classification) ■ Supervision: The training data (observations, measurements, etc.) are accompanied by labels indicating the class of the observations ■ New data is classified based on the training set 🞐 Unsupervised learning (clustering) ■ The class labels of training data is unknown ■ Given a set of measurements, observations, etc. with the aim of establishing the existence of classes or clusters in the data
  • 15. Md. Manowarul Islam, Dept. Of CSE, JnU 🞐 Data cleaning ■ Preprocess data in order to reduce noise and handle missing values 🞐 Relevance analysis (feature selection) ■ Remove the irrelevant or redundant attributes 🞐 Data transformation ■ Generalize and/or normalize data 🞐 numerical attribute income categorical ⇒ {low,medium,high} 🞐 normalize all numerical attributes to [0,1] Classification and prediction : Data Preparation
  • 16. Md. Manowarul Islam, Dept. Of CSE, JnU 🞐 Predictive accuracy 🞐 Speed ■ time to construct the model ■ time to use the model 🞐 Robustness ■ handling noise and missing values 🞐 Scalability ■ efficiency in disk-resident databases 🞐 Interpretability: ■ understanding and insight provided by the model 🞐 Goodness of rules (quality) ■ decision tree size ■ compactness of classification rules Evaluating Classification Methods
  • 17. Md. Manowarul Islam, Dept. Of CSE, JnU Evaluation of classification models 🞐 Counts of test records that are correctly (or incorrectly) predicted by the classification model 🞐 Confusion matrix Class = 1 Class = 0 Class = 1 f11 f10 Class = 0 f01 f00 Predicted Class Actual Class
  • 18. Md. Manowarul Islam, Dept. Of CSE, JnU Classification Techniques 🞐Decision Tree based Methods 🞐Rule-based Methods 🞐Memory based reasoning 🞐Neural Networks 🞐Naïve Bayes and Bayesian Belief Networks 🞐Support Vector Machines
  • 19. Md. Manowarul Islam, Dept. Of CSE, JnU 🞐Decision tree ■A flow-chart-like tree structure ■Internal node denotes a test on an attribute ■Branch represents an outcome of the test ■Leaf nodes represent class labels or class distribution Decision Trees
  • 20. Md. Manowarul Islam, Dept. Of CSE, JnU categorical categorical continuous class Refund MarSt TaxInc YES NO NO NO Yes No Married Single, Divorced < 80K > 80K Splitting Attributes Training Data Model: Decision Tree Test outcome Class labels Example of a Decision Tree
  • 21. Md. Manowarul Islam, Dept. Of CSE, JnU Another Example of Decision Tree categorical categorical continuous class MarSt Refund TaxInc YES NO NO NO Yes No Married Single, Divorced < 80K > 80K There could be more than one tree that fits the same data!
  • 22. Md. Manowarul Islam, Dept. Of CSE, JnU Apply Model to Test Data Refund MarSt TaxInc YES NO NO NO Yes No Married Single, Divorced < 80K > 80K Test Data Start from the root of tree. Refund Marital Status Taxable Income Cheat No Married 80K ?
  • 23. Md. Manowarul Islam, Dept. Of CSE, JnU Apply Model to Test Data Refund MarSt TaxInc YES NO NO NO Yes No Married Single, Divorced < 80K > 80K Test Data Refund Marital Status Taxable Income Cheat No Married 80K ?
  • 24. Md. Manowarul Islam, Dept. Of CSE, JnU Apply Model to Test Data Refund MarSt TaxInc YES NO NO NO Yes No Married Single, Divorced < 80K > 80K Test Data Refund Marital Status Taxable Income Cheat No Married 80K ?
  • 25. Md. Manowarul Islam, Dept. Of CSE, JnU Apply Model to Test Data Refund MarSt TaxInc YES NO NO NO Yes No Married Single, Divorced < 80K > 80K Test Data Refund Marital Status Taxable Income Cheat No Married 80K ?
  • 26. Md. Manowarul Islam, Dept. Of CSE, JnU Apply Model to Test Data Refund MarSt TaxInc YES NO NO NO Yes No Married Single, Divorced < 80K > 80K Test Data Refund Marital Status Taxable Income Cheat No Married 80K ?
  • 27. Md. Manowarul Islam, Dept. Of CSE, JnU Apply Model to Test Data Refund MarSt TaxInc YES NO NO NO Yes No Married Single, Divorced < 80K > 80K Assign Cheat to “No” Test Data Refund Marital Status Taxable Income Cheat No Married 80K ?
  • 28. Md. Manowarul Islam, Dept. Of CSE, JnU General Structure of Hunt’s Algorithm 🞐 Let Dt be the set of training records that reach a node t 🞐 General Procedure: ■ If Dt contains records that belong the same class yt, then t is a leaf node labeled as yt ■ If Dt contains records with the same attribute values, then t is a leaf node labeled with the majority class yt ■ If Dt is an empty set, then t is a leaf node labeled by the default class, yd ■ If Dt contains records that belong to more than one class, use an attribute test to split the data into smaller subsets. 🞐 Recursively apply the procedure to each subset. Dt ?
  • 29. Md. Manowarul Islam, Dept. Of CSE, JnU Hunt’s Algorithm Don’t Cheat
  • 30. Md. Manowarul Islam, Dept. Of CSE, JnU Hunt’s Algorithm Don’t Cheat Refun d Don’t Cheat Don’t Cheat Yes No
  • 31. Md. Manowarul Islam, Dept. Of CSE, JnU Hunt’s Algorithm Don’t Cheat Refun d Don’t Cheat Don’t Cheat Yes No Refun d Don’t Cheat Yes No Marital Status Cheat Single, Divorced Marri ed Don’t Cheat
  • 32. Md. Manowarul Islam, Dept. Of CSE, JnU Hunt’s Algorithm Don’t Cheat Refun d Don’t Cheat Don’t Cheat Yes No Refun d Don’t Cheat Yes No Marital Status Cheat Single, Divorced Marri ed Don’t Cheat < 80K >= 80K Taxable Income Refun d Don’t Cheat Yes No Marital Status Single, Divorced Marri ed Don’t Cheat Don’t Cheat Cheat
  • 33. Md. Manowarul Islam, Dept. Of CSE, JnU Tree Induction 🞐Finding the best decision tree is NP-hard 🞐Greedy strategy. ■Split the records based on an attribute test that optimizes certain criterion. 🞐Many Algorithms: ■Hunt’s Algorithm (one of the earliest) ■CART ■ID3, C4.5 ■SLIQ,SPRINT
  • 34. Md. Manowarul Islam, Dept. Of CSE, JnU Classification by Decision Tree Induction 🞐 Decision tree ■ A flow-chart-like tree structure ■ Internal node denotes a test on an attribute ■ Branch represents an outcome of the test ■ Leaf nodes represent class labels or class distribution 🞐 Decision tree generation consists of two phases ■ Tree construction 🞐 At start, all the training examples are at the root 🞐 Partition examples recursively based on selected attributes ■ Tree pruning 🞐 Identify and remove branches that reflect noise or outliers 🞐 Use of decision tree: Classifying an unknown sample ■ Test the attribute values of the sample against the decision tree
  • 35. Md. Manowarul Islam, Dept. Of CSE, JnU Training Dataset
  • 36. Md. Manowarul Islam, Dept. Of CSE, JnU Output: A Decision Tree for “buys_computer” age? overcas t student? credit rating? n o ye s fai r excellen t <=30 >40 n o n o ye s ye s ye s 30..40
  • 37. Md. Manowarul Islam, Dept. Of CSE, JnU Algorithm for Decision Tree Induction 🞐 Basic algorithm (a greedy algorithm) ■ Tree is constructed in a top-down recursive divide-and-conquer manner ■ At start, all the training examples are at the root ■ Attributes are categorical (if continuous-valued, they are discretized in advance) ■ Samples are partitioned recursively based on selected attributes ■ Test attributes are selected on the basis of a heuristic or statistical measure (e.g., information gain) 🞐 Conditions for stopping partitioning ■ All samples for a given node belong to the same class ■ There are no remaining attributes for further partitioning – majority voting is employed for classifying the leaf ■ There are no samples left
  • 38. Md. Manowarul Islam, Dept. Of CSE, JnU Attribute Selection Measure: 🞐 Information Gain (ID3/C4.5) 🞐 Select the attribute with the highest information gain age ? overcas t student ? credit rating? n o ye s fai r excellen t <=3 0 >4 0 n o n o ye s ye s ye s 30..40
  • 39. Md. Manowarul Islam, Dept. Of CSE, JnU Attribute Selection Measure: 🞐 Let D, the data partition, be a training set of class-labeled tuples. 🞐 m distinct classes, Ci (for i = 1,…,m). 🞐 Ci, D be the set of tuples in D belongs to class Ci 🞐 |Ci, D| and |D| number of tuples in Ci, D and D
  • 40. Md. Manowarul Islam, Dept. Of CSE, JnU Attribute Selection Measure: 🞐Let pi be the probability that an arbitrary tuple in D belongs to class Ci, estimated by ■ pi = |Ci, D|/|D| 🞐Expected information (entropy) needed to classify a tuple in D:
  • 41. Training Dataset 🞐 The class label attribute, buys Computer ■ Two distinct values (yes, no); 🞐 There are two distinct classes (that is, m = 2). 🞐 Let class C1 correspond to yes and class C2 correspond to no. 🞐 There are nine tuples of class yes and five tuples of class no.
  • 42. g Class C1: buys_computer = “yes” g Class C2: buys_computer = “no” Attribute Selection: Information Gain
  • 43. ■ Suppose we want to partition the tuples in D on some attribute A having v distinct values , {a1, a2, … , av} ■ Attribute A can be used to split D into v partitions or subsets, {D1, D2, … , Dv}, ■ Where Dj contains those tuples in D that have outcome aj of A. ■ Information needed (after using A to split D into v partitions) to classify D: ■ Information gained by branching on attribute A Attribute Selection: Information Gain
  • 44. g Class C1: buys_computer = “yes” g Class C2: buys_computer = “no” Age Tuple C1(Y) C2(N) <=30 5(14) 2 3 31…40 4(14) 4 0 >40 5(14) 3 2 Attribute Selection: Information Gain
  • 45. Age Tuple C1(Y) C2(N) <=30 5(14) 2 3 31…40 4(14) 4 0 >40 5(14) 3 2 Attribute Selection: Information Gain
  • 46. Md. Manowarul Islam, Dept. Of CSE, JnU Attribute Selection: Information Gain
  • 47. Splitting the samples using age age? <=3 0 30...4 0 >4 0 labeled yes
  • 48. Md. Manowarul Islam, Dept. Of CSE, JnU Output: A Decision Tree for “buys_computer” age? overcas t student? credit rating? n o ye s fai r excellen t <=30 >40 n o n o ye s ye s ye s 30..40
  • 49. Md. Manowarul Islam, Dept. Of CSE, JnU Gain Ratio for Attribute Selection (C4.5) 🞐 The information gain measure is biased toward tests with many outcomes 🞐 consider an attribute that acts as a unique identifier, such as product_ID. 🞐 split on product_ID would result in a large number of partitions 🞐 Infoproduct_ID(D) = 0. 🞐 Information gained by partitioning on this attribute is maximal. 🞐 Such a partitioning is useless for classification.
  • 50. Md. Manowarul Islam, Dept. Of CSE, JnU Gain Ratio for Attribute Selection (C4.5) 🞐 Information gain measure is biased towards attributes with a large number of values 🞐 C4.5 (a successor of ID3) uses gain ratio to overcome the problem (normalization to information gain)
  • 51. Income Tuple low 4(14) medium 6(14) high 4(14) Gain Ratio for Attribute Selection (C4.5)
  • 52. Md. Manowarul Islam, Dept. Of CSE, JnU 🞐 Ex. gain_ratio(income) = 0.029/0.926 = 0.031 🞐 The attribute with the maximum gain ratio is selected as the splitting attribute Income Tuple low 4(14) medium 6(14) high 4(14) Gain Ratio for Attribute Selection (C4.5)
  • 53. Md. Manowarul Islam, Dept. Of CSE, JnU Thank you

Editor's Notes

  • #43: I : the expected information needed to classify a given sample E (entropy) : expected information based on the partitioning into subsets by A
  翻译: