SlideShare a Scribd company logo
By : Mohd. Noor Abdul Hamid, Ph.D
(Universiti Utara Malaysia)
Introduction to Classification
Classification  the task of assigning objects to
one of several predefined categories or class.
Given a collection of records (training set )
Each record contains a set of attributes, one of the attributes
is the class.
Find a model for class attribute as a function of
the values of other attributes.
Goal: previously unseen records should be
assigned a class as accurately as possible.
A test set is used to determine the accuracy of the model.
Usually, the given data set is divided into training and test sets,
with training set used to build the model and test set used to
validate it.
By : Mohd. Noor Abdul Hamid, Ph.D
(Universiti Utara Malaysia)
By : Mohd. Noor Abdul Hamid
(UUM)
Apply
Model
Induction
Deduction
Learn
Model
Model
Tid Attrib1 Attrib2 Attrib3 Class
1 Yes Large 125K No
2 No Medium 100K No
3 No Small 70K No
4 Yes Medium 120K No
5 No Large 95K Yes
6 No Medium 60K No
7 Yes Large 220K No
8 No Small 85K Yes
9 No Medium 75K No
10 No Small 90K Yes
10
Tid Attrib1 Attrib2 Attrib3 Class
11 No Small 55K ?
12 Yes Medium 80K ?
13 Yes Large 110K ?
14 No Small 95K ?
15 No Large 67K ?
10
Test Set
Learning
algorithm
Training Set
Introduction to Classification
Introduction to Classification
Example of Classification techniques:
- Decision Tree
- Neural Network
- Rule-Based
- naïve Bayes Classifier, etc.
Classification techniques are most suited for
predicting data sets with binary or nominal
categories. They are less effective for ordinal
categories since they do not consider the implicit
order among the categories.
By : Mohd. Noor Abdul Hamid, Ph.D
(Universiti Utara Malaysia)
Introduction to Classification
Performance of a Classification model is evaluated based on
the counts of test records correctly and incorrectly predicted
by the model
 Confusion Matrix
Figure : Confusion matrix for a 2-class problem
Based on the entries of the confusion matrix, the total
number of :
- correct prediction made by the model is (f11 + f00)
- incorrect predictions is(f01 + f10).
Predicted Class
Class = 1 Class = 0
Actual
Class
Class = 1 f11 f10
Class = 0 f01 f00
By : Mohd. Noor Abdul Hamid, Ph.D
(Universiti Utara Malaysia)
Introduction to Classification
Therefore we can evaluate the performance of
Classification model by looking at the accuracy of the
model to make prediction.
Equivalently, the performance of a model can
be expressed in terms of its error rate:
Accuracy =
Error Rate =
00011011
0110
ffff
ff
spredictionofnumberTotal
spredictionwrongofNumber



00011011
0011
ffff
ff
spredictionofnumberTotal
spredictioncorrectofNumber



By : Mohd. Noor Abdul Hamid, Ph.D
(Universiti Utara Malaysia)
• Predicting tumor cells as benign or malignant
• Classifying credit card transactions
as legitimate or fraudulent
• Classifying secondary structures of protein
as alpha-helix, beta-sheet, or random
coil
• Categorizing news stories as finance,
weather, entertainment, sports, etc
Example of Classification Task
By : Mohd. Noor Abdul Hamid, Ph.D
(Universiti Utara Malaysia)
By : Mohd. Noor Abdul Hamid, Ph.D
(Universiti Utara Malaysia)
What is a Decision Tree?
A decision tree is a structure that can be used to
divide up a large collection of records into
successfully smaller sets of records by applying a
sequence of simple decision rules.
With each successive division, the members of the
resulting sets become more and more similar to each
other.
A decision tree model consists of a set of rules for
dividing a large heterogeneous population into
smaller, more homogeneous (mutually exclusive)
groups with respect to a particular target.
Hence, the algorithm used to construct decision
tree is referred to as recursive partitioning.
By : Mohd. Noor Abdul Hamid, Ph.D
(Universiti Utara Malaysia)
What is a Decision Tree?
The target variable is usually categorical and the
decision tree is used either to:
Calculate the probability that a given record belong to
each of the category or,
To classify the record by assigning it to the most likely class
(or category).
Note : Decision tree can also be used to estimate the value of a
continuous target variable. However, regression models and
neural network are generally more appropriate for
estimation.
By : Mohd. Noor Abdul Hamid, Ph.D
(Universiti Utara Malaysia)
What is a Decision Tree?
Decision Tree has three types of nodes:
Root Node : top (or left-most) node with no incoming edges
and zero or more outgoing edges.
Child or Internal Node : descendent node which has exactly
one incoming edge and two or more outgoing edges.
Leaf Node : terminal node which has exactly one incoming
edge and no outgoing edges.
In Decision Tree, each leaf node is assigned a class
label.
The rules or branches are the unique path (edges)
with a set of conditions (attribute) that divide the
observations into smaller subset.
By : Mohd. Noor Abdul Hamid, Ph.D
(Universiti Utara Malaysia)
Decision Tree Diagram
Gender
Height
Height
Short Tall Short
Medium
Tall
Female
Male
<1.3m >1.8m <1.5m
> 2.0m
ROOT
NODE
INTERNAL
NODE
BRANCH
Medium
LEAF NODE
By : Mohd. Noor Abdul Hamid, Ph.D
(Universiti Utara Malaysia)
Types of Decision Tree
Balanced Tree
Bushy Tree
Deep Tree
By : Mohd. Noor Abdul Hamid, Ph.D
(Universiti Utara Malaysia)
How to Build Decision Tree?
Generally, building a decision tree involved 2 steps:
Tree construction  recursively split the tree according to
selected attributes (conditions),
Tree pruning  identify and remove the irrelevance
branches (that might lead to outliers) – to increase
classification accuracy.
Pruning
By : Mohd. Noor Abdul Hamid, Ph.D
(Universiti Utara Malaysia)
By : Mohd. Noor Abdul Hamid
(UUM)
How to Build Decision Tree?
In principle, there are exponentially many decision
tree that can be construct from a given set of
attributes  finding the optimal tree is computationally infeasible
because of the exponential size of the search space.
Efficient algorithms has been develop to induce
reasonably accurate, albeit suboptimal, decision
tree in a reasonable amount of time.
These algorithm usually employ a greedy strategy 
making a series of locally optimal decisions about
which attribute to use for partitioning the data.
One such algorithm is Hunt’s Algorithm –
which is the basis of many existing decision tree
algorithm including ID3, C4.5 and CART.
By : Mohd. Noor Abdul Hamid
(UUM)
Hunt’s Algorithm
Let Dt be the set of training records that are associated with
node t and y = {y1, y2,…yc}, where y is the target variable
with c number of classes.
The following is a recursive definition of Hunt’s algorithm:
Step 1 :
If all the records in Dt belong to the same class yt, then node t is a
leaf node labeled as yt.
Step 2 :
If Dt contains records that belong to more than one class, an
attribute test condition is selected to partition the records into
smaller subsets. A child node is created for each outcome of the
test condition and the records in Dt are distributed to the children
based on the outcomes. The algorithm is then recursively applied
to each child node.
By : Mohd. Noor Abdul Hamid
(UUM)
Example of a Decision Tree
Tid Refund Marital
Status
Taxable
Income Cheat
1 Yes Single 125K No
2 No Married 100K No
3 No Single 70K No
4 Yes Married 120K No
5 No Divorced 95K Yes
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No
10 No Single 90K Yes
10
Refund
MarSt
TaxInc
YESNO
NO
NO
Yes No
MarriedSingle, Divorced
< 80K > 80K
Splitting Attributes
Training Data Model: Decision Tree
By : Mohd. Noor Abdul Hamid
(UUM)
Another Example of Decision
Tree
Tid Refund Marital
Status
Taxable
Income Cheat
1 Yes Single 125K No
2 No Married 100K No
3 No Single 70K No
4 Yes Married 120K No
5 No Divorced 95K Yes
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No
10 No Single 90K Yes
10
MarSt
Refund
TaxInc
YESNO
NO
NO
Yes No
Married
Single,
Divorced
< 80K > 80K
There could be more than one
tree that fits the same data!
By : Mohd. Noor Abdul Hamid
(UUM)
Apply Model to Test Data
Refund
MarSt
TaxInc
YESNO
NO
NO
Yes No
MarriedSingle, Divorced
< 80K > 80K
Refund Marital
Status
Taxable
Income Cheat
No Married 80K ?
10
Test Data
Start from the root of tree.
By : Mohd. Noor Abdul Hamid
(UUM)
Apply Model to Test Data
Refund
MarSt
TaxInc
YESNO
NO
NO
Yes No
MarriedSingle, Divorced
< 80K > 80K
Refund Marital
Status
Taxable
Income Cheat
No Married 80K ?
10
Test Data
By : Mohd. Noor Abdul Hamid
(UUM)
Apply Model to Test Data
Refund
MarSt
TaxInc
YESNO
NO
NO
Yes No
MarriedSingle, Divorced
< 80K > 80K
Refund Marital
Status
Taxable
Income Cheat
No Married 80K ?
10
Test Data
By : Mohd. Noor Abdul Hamid
(UUM)
Apply Model to Test Data
Refund
MarSt
TaxInc
YESNO
NO
NO
Yes No
MarriedSingle, Divorced
< 80K > 80K
Refund Marital
Status
Taxable
Income Cheat
No Married 80K ?
10
Test Data
By : Mohd. Noor Abdul Hamid
(UUM)
Apply Model to Test Data
Refund
MarSt
TaxInc
YESNO
NO
NO
Yes No
MarriedSingle, Divorced
< 80K > 80K
Refund Marital
Status
Taxable
Income Cheat
No Married 80K ?
10
Test Data
By : Mohd. Noor Abdul Hamid
(UUM)
Apply Model to Test Data
Refund
MarSt
TaxInc
YESNO
NO
NO
Yes No
MarriedSingle, Divorced
< 80K > 80K
Refund Marital
Status
Taxable
Income Cheat
No Married 80K ?
10
Test Data
Assign Cheat to “No”
By : Mohd. Noor Abdul Hamid
(UUM)
Design Issue of Decision Tree Induction
How should the training records be split?
Which attribute test condition works better to classify the
records?
What is the objective measures for evaluating the goodness
of each test condition?
How should the splitting procedure stop?
What is the condition to stop splitting the records?
One strategy is to continue expanding a node until all the
records belong to the same class or all the records have
identical attributes values.
Other criteria can also be imposed to allow the tree-growing
procedure to terminate earlier.
By : Mohd. Noor Abdul Hamid
(UUM)
Methods for Expressing Attribute Test Condition
a) Binary Attributes  generates two possible outcomes
(binary split)
GENDER
Male Female
By : Mohd. Noor Abdul Hamid
(UUM)
Methods for Expressing Attribute Test Condition
b) Nominal Attributes : Multiway split
Marital Status
Single Divorced Married
By : Mohd. Noor Abdul Hamid
(UUM)
Methods for Expressing Attribute Test Condition
b) Nominal Attributes : Binary split (eg : in CART)
Marital Status
{Single} { Married, Divorced}
Marital Status
{Married} { Single, Divorced}
Marital Status
{Divorced} { Married, Single }
OR
By : Mohd. Noor Abdul Hamid
(UUM)
Methods for Expressing Attribute Test Condition
c) Ordinal Attributes : Multiway split
Shirt Size
Small Large Extra LargeMedium
By : Mohd. Noor Abdul Hamid
(UUM)
Methods for Expressing Attribute Test Condition
c) Ordinal Attributes : Binary split – as long as it does not violate
the order property of the attribute values.
Shirt Size
{S, M} { L, XL}
Marital Status
{S} { M, L, XL }
Marital Status
{S, L } { M, XL }
OR
By : Mohd. Noor Abdul Hamid
(UUM)
Methods for Expressing Attribute Test Condition
d) Continuous Attributes  Binary split
Annual
Income
> 80K
Yes No
By : Mohd. Noor Abdul Hamid
(UUM)
Methods for Expressing Attribute Test Condition
d) Continuous Attributes : Multiway split
Annual Income
<10K 50K, 80K} >80k{10K, 25K} {25K, 50K}
By : Mohd. Noor Abdul Hamid
(UUM)
Measures for selecting the Best Split
Let p(i|t) denote the fraction of records belonging to class
i at a given node t.
In a two-class problem, the class distribution at any node
can be written as (p0, p1), where p1 = 1 – p0
The measure developed for selecting the best split are
often based on the degree of impurity of the child class
distribution (0,1) has zero impurity, whereas a node with
uniform class distribution (0.5, 0.5) has the highest impurity.
Example of impurity measures include:
Entropy
Gini
Classification error Categorical Target
Information Gain Ratio
Chi Square Test
Variance Reduction
F-Test Interval Target
By : Mohd. Noor Abdul Hamid
(UUM)
Measures of Impurity (I)




1
0
2 )|(log)|()(
c
i
tiptiptEntropy




1
0
2
)]|([1)(
c
i
tiptGini
)]|([max1)( tipterrortionClassifica
i

Where c is the number of classes and
0log20 = 0 in the entropy calculations
By : Mohd. Noor Abdul Hamid
(UUM)
Example : Measures of Impurity
Parent Node Count
Class = 0 4
Class = 1 14
Node N1 Count
Class = 0 0
Class = 1 6
Node N2 Count
Class = 0 1
Class = 1 5
Node N3 Count
Class = 0 3
Class = 1 3
Node N1:
0)]6/6(),6/0max[(1
0)6/6(log)6/6()6/0(log)6/0(
0)6/6()6/0(1
22
22



Error
Entropy
Gini
By : Mohd. Noor Abdul Hamid
(UUM)
Example : Measures of Impurity
Parent Node Count
Class = 0 4
Class = 1 14
Node N1 Count
Class = 0 0
Class = 1 6
Node N2 Count
Class = 0 1
Class = 1 5
Node N3 Count
Class = 0 3
Class = 1 3
Node N2:
167.0)]6/5(),6/1max[(1
65.0)6/5(log)6/5()6/1(log)6/1(
278.0)6/5()6/1(1
22
22



Error
Entropy
Gini
By : Mohd. Noor Abdul Hamid
(UUM)
Example : Measures of Impurity
Parent Node Count
Class = 0 4
Class = 1 14
Node N1 Count
Class = 0 0
Class = 1 6
Node N2 Count
Class = 0 1
Class = 1 5
Node N3 Count
Class = 0 3
Class = 1 3
Node N3:
5.0)]6/3(),6/3max[(1
1)6/3(log)6/3()6/3(log)6/3(
5.0)6/3()6/3(1
22
22



Error
Entropy
Gini
By : Mohd. Noor Abdul Hamid
(UUM)
Example : Measures of Impurity
Parent Node Count
Class = 0 4
Class = 1 14
Node N1 Count
Class = 0 0
Class = 1 6
Gini 0
Entropy 0
Error 0
Node N2 Count
Class = 0 1
Class = 1 5
Gini 0.278
Entropy 0.650
Error 0.167
Node N3 Count
Class = 0 3
Class = 1 3
Gini 0.5
Entropy 1
Error 0.5
N1 has the lowest impurity value, followed by N2 and N3
By : Mohd. Noor Abdul Hamid
(UUM)
Measures of Impurity
To determine how well a test condition performs, we need to
compare the degree of impurity of the parent node (before
splitting) and the child node (after splitting).
The larger the different, the better the test condition.
The gain ∆, is a criterion that can be used to determine the
goodness of a split.
Where:
I(.) is the impurity measure of a given node
N is the total number of records at the parent node
k is the number of attributes value (class)
N(vj) is the number of records associated with the
child node vj.
)(
)(
)(
1
j
k
j
j
vI
N
vN
parentI 

Weighted
Average
Impurity
By : Mohd. Noor Abdul Hamid
(UUM)
Measures of Impurity : Info. Gain Ratio
Since I(parent) is the same for all test condition,
maximizing the gain is equivalent to minimizing the
weighted average impurity measure of the child
nodes.
When entropy is used as the impurity measure, the
difference in entropy is known as the Information
Gain Ratio (IGR)
Decision tree build using entropy tend to be quite
bushy. Bushy tree with many multi-way split are
undesirable as these splits lead to small numbers of
records in each node.
By : Mohd. Noor Abdul Hamid
(UUM)
Splitting Binary Attributes (using Gini)
Example :
Suppose there are two ways (A and B) to split the data into smaller subset.
N2
C0 2
C1 3
N1
C0 1
C1 4
N1
C0 4
C1 3
N2
C0 5
C1 2
Parent
C0 6
C1 6
Gini = 0.5
A B
Which one is a better split??
Compute the weighted average of the Gini index of
both attribute
Gini :
1 –(6/12)2 – (6/12)2
= 0.5
Gini Index:
0.4898
Gini Index:
0.480
Gini Index:
0.4082
Gini Index:
0.320
By : Mohd. Noor Abdul Hamid
(UUM)
Splitting Binary Attributes (using Gini)
Example :
N2
C0 2
C1 3
N1
C0 1
C1 4
N1
C0 4
C1 3
N2
C0 5
C1 2
A B
Gini Index:
0.4898
Gini Index:
0.480
Gini Index:
0.4082
Gini Index:
0.320
Weighted Average of Gini Index:
[(7/12) x 0.4898] + [(5/12) x 0.480]
= 0.486
Gain, ∆ = 0.5 - 0.486 = 0.014
Weighted Average of Gini Index:
[(5/12) x 0.320] + [(7/12) x 0.4082]
= 0.3715
Gain, ∆ = 0.5 - 0.3715 = 0.1285
Therefore, B is preferred

By : Mohd. Noor Abdul Hamid
(UUM)
Splitting Nominal Attributes (using Gini)
Example : Which split is better? Binary or Multi-way splits.
Family
C0 1
C1 3
Sports
C0 8
C1 0
Sports, Luxury
C0 9
C1 7
Family, Luxury
C0 2
C1 10
Car Type Car Type
Car Type
Family
C0 1
C1 3
Sports
C0 8
C1 0
Luxury
C0 1
C1 7
Weighted Average Gini = 0.468 Weighted Average Gini = 0.167
Weighted Average Gini = 0.163

By : Mohd. Noor Abdul Hamid
(UUM)
Splitting Continuous Attributes (using Gini)
A brute-force method is used to find the best split position (v) for
a continuous attribute (eg: Annual Income).
To reduce complexity, the training records are sorted based on
the annual income.
Class No No No Yes Yes Yes No No No No
Annual
Income
(sorted)
60 70 75 85 90 95 100 120 125 220
Candidate split positions (v) are identified by taking the
midpoints between two adjacent sorted values.
0716253443434343526170No
0303030303122130303030Yes
>≤>≤>≤>≤>≤>≤>≤>≤>≤>≤>≤
23017212211097928780726555Split
position
(mid
points)
By : Mohd. Noor Abdul Hamid
(UUM)
Splitting Continuous Attributes (using Gini)
A brute-force method is used to find the best split position (v) for
a continuous attribute (eg: Annual Income).
We then compute the Gini index for each candidate and
choose the one that gives the lowest value.
Class No No No Yes Yes Yes No No No No
Annual
Income
(sorted)
60 70 75 85 90 95 100 120 125 220
0716253443434343526170No
0303030303122130303030Yes
>≤>≤>≤>≤>≤>≤>≤>≤>≤>≤>≤
23017212211097928780726555Split
position
(mid
points)
Gini 0.420 0.400 0.375 0.343 0.417 0.400 0.300 0.343 0.375 0.4 0.420

Ad

More Related Content

What's hot (20)

Machine learning with ADA Boost
Machine learning with ADA BoostMachine learning with ADA Boost
Machine learning with ADA Boost
Aman Patel
 
Classification: Basic Concepts and Decision Trees
Classification: Basic Concepts and Decision TreesClassification: Basic Concepts and Decision Trees
Classification: Basic Concepts and Decision Trees
sathish sak
 
Random forest
Random forestRandom forest
Random forest
Musa Hawamdah
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision trees
Knoldus Inc.
 
Clustering
ClusteringClustering
Clustering
M Rizwan Aqeel
 
Decision tree
Decision treeDecision tree
Decision tree
Ami_Surati
 
Data Mining: Association Rules Basics
Data Mining: Association Rules BasicsData Mining: Association Rules Basics
Data Mining: Association Rules Basics
Benazir Income Support Program (BISP)
 
Presentation on K-Means Clustering
Presentation on K-Means ClusteringPresentation on K-Means Clustering
Presentation on K-Means Clustering
Pabna University of Science & Technology
 
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Simplilearn
 
Decision Trees
Decision TreesDecision Trees
Decision Trees
Student
 
Association rule mining
Association rule miningAssociation rule mining
Association rule mining
Acad
 
Clustering in Data Mining
Clustering in Data MiningClustering in Data Mining
Clustering in Data Mining
Archana Swaminathan
 
Decision trees
Decision treesDecision trees
Decision trees
Jagjit Wilku
 
Semi-Supervised Learning
Semi-Supervised LearningSemi-Supervised Learning
Semi-Supervised Learning
Lukas Tencer
 
Random forest algorithm
Random forest algorithmRandom forest algorithm
Random forest algorithm
Rashid Ansari
 
Classification and prediction
Classification and predictionClassification and prediction
Classification and prediction
Acad
 
Decision tree induction \ Decision Tree Algorithm with Example| Data science
Decision tree induction \ Decision Tree Algorithm with Example| Data scienceDecision tree induction \ Decision Tree Algorithm with Example| Data science
Decision tree induction \ Decision Tree Algorithm with Example| Data science
MaryamRehman6
 
MACHINE LEARNING - GENETIC ALGORITHM
MACHINE LEARNING - GENETIC ALGORITHMMACHINE LEARNING - GENETIC ALGORITHM
MACHINE LEARNING - GENETIC ALGORITHM
Puneet Kulyana
 
Mining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsMining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and Correlations
Justin Cletus
 
Classification in data mining
Classification in data mining Classification in data mining
Classification in data mining
Sulman Ahmed
 
Machine learning with ADA Boost
Machine learning with ADA BoostMachine learning with ADA Boost
Machine learning with ADA Boost
Aman Patel
 
Classification: Basic Concepts and Decision Trees
Classification: Basic Concepts and Decision TreesClassification: Basic Concepts and Decision Trees
Classification: Basic Concepts and Decision Trees
sathish sak
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision trees
Knoldus Inc.
 
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Simplilearn
 
Decision Trees
Decision TreesDecision Trees
Decision Trees
Student
 
Association rule mining
Association rule miningAssociation rule mining
Association rule mining
Acad
 
Semi-Supervised Learning
Semi-Supervised LearningSemi-Supervised Learning
Semi-Supervised Learning
Lukas Tencer
 
Random forest algorithm
Random forest algorithmRandom forest algorithm
Random forest algorithm
Rashid Ansari
 
Classification and prediction
Classification and predictionClassification and prediction
Classification and prediction
Acad
 
Decision tree induction \ Decision Tree Algorithm with Example| Data science
Decision tree induction \ Decision Tree Algorithm with Example| Data scienceDecision tree induction \ Decision Tree Algorithm with Example| Data science
Decision tree induction \ Decision Tree Algorithm with Example| Data science
MaryamRehman6
 
MACHINE LEARNING - GENETIC ALGORITHM
MACHINE LEARNING - GENETIC ALGORITHMMACHINE LEARNING - GENETIC ALGORITHM
MACHINE LEARNING - GENETIC ALGORITHM
Puneet Kulyana
 
Mining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsMining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and Correlations
Justin Cletus
 
Classification in data mining
Classification in data mining Classification in data mining
Classification in data mining
Sulman Ahmed
 

Similar to Classification Using Decision tree (20)

3 classification
3  classification3  classification
3 classification
Mahmoud Alfarra
 
Microsoft PowerPoint - Lec 04 - Decision Tree Learning.pdf
Microsoft PowerPoint - Lec 04 - Decision Tree Learning.pdfMicrosoft PowerPoint - Lec 04 - Decision Tree Learning.pdf
Microsoft PowerPoint - Lec 04 - Decision Tree Learning.pdf
ZainabShahzad9
 
Lecture_21_22_Classification_Instance-based Learning
Lecture_21_22_Classification_Instance-based LearningLecture_21_22_Classification_Instance-based Learning
Lecture_21_22_Classification_Instance-based Learning
momtajhossainmowmoni
 
Lt. 5 Pattern Reg.pptx
Lt. 5  Pattern Reg.pptxLt. 5  Pattern Reg.pptx
Lt. 5 Pattern Reg.pptx
ssuser5c580e1
 
Data mining technique (decision tree)
Data mining technique (decision tree)Data mining technique (decision tree)
Data mining technique (decision tree)
Shweta Ghate
 
Research trends in data warehousing and data mining
Research trends in data warehousing and data miningResearch trends in data warehousing and data mining
Research trends in data warehousing and data mining
Er. Nawaraj Bhandari
 
pattern recognition techniques and algo.pptx
pattern recognition techniques and algo.pptxpattern recognition techniques and algo.pptx
pattern recognition techniques and algo.pptx
Anilkamboj25
 
Practical Language Testing by Fulcher (2010)
Practical Language Testing by Fulcher (2010)Practical Language Testing by Fulcher (2010)
Practical Language Testing by Fulcher (2010)
Mahsa Farahanynia
 
A Decision Tree Based Classifier for Classification & Prediction of Diseases
A Decision Tree Based Classifier for Classification & Prediction of DiseasesA Decision Tree Based Classifier for Classification & Prediction of Diseases
A Decision Tree Based Classifier for Classification & Prediction of Diseases
ijsrd.com
 
Decision Tree Machine Learning Detailed Explanation.
Decision Tree Machine Learning Detailed Explanation.Decision Tree Machine Learning Detailed Explanation.
Decision Tree Machine Learning Detailed Explanation.
DrezzingGaming
 
Presentation on supervised learning
Presentation on supervised learningPresentation on supervised learning
Presentation on supervised learning
Tonmoy Bhagawati
 
Common Shortcomings in SE Experiments (ICSE'14 Doctoral Symposium Keynote)
Common Shortcomings in SE Experiments (ICSE'14 Doctoral Symposium Keynote)Common Shortcomings in SE Experiments (ICSE'14 Doctoral Symposium Keynote)
Common Shortcomings in SE Experiments (ICSE'14 Doctoral Symposium Keynote)
Natalia Juristo
 
Data mining chapter04and5-best
Data mining chapter04and5-bestData mining chapter04and5-best
Data mining chapter04and5-best
ABDUmomo
 
ensemble learning
ensemble learningensemble learning
ensemble learning
butest
 
Aligning tests to standards
Aligning tests to standardsAligning tests to standards
Aligning tests to standards
Fariba Chamani
 
icaisd presentation 2021.pptx
icaisd presentation 2021.pptxicaisd presentation 2021.pptx
icaisd presentation 2021.pptx
EllyMuningsih2
 
Mb0050 research methodology
Mb0050   research methodologyMb0050   research methodology
Mb0050 research methodology
smumbahelp
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
Kai Koenig
 
On the Measurement of Test Collection Reliability
On the Measurement of Test Collection ReliabilityOn the Measurement of Test Collection Reliability
On the Measurement of Test Collection Reliability
Julián Urbano
 
Classification
ClassificationClassification
Classification
Amit Kumar Rathi
 
Microsoft PowerPoint - Lec 04 - Decision Tree Learning.pdf
Microsoft PowerPoint - Lec 04 - Decision Tree Learning.pdfMicrosoft PowerPoint - Lec 04 - Decision Tree Learning.pdf
Microsoft PowerPoint - Lec 04 - Decision Tree Learning.pdf
ZainabShahzad9
 
Lecture_21_22_Classification_Instance-based Learning
Lecture_21_22_Classification_Instance-based LearningLecture_21_22_Classification_Instance-based Learning
Lecture_21_22_Classification_Instance-based Learning
momtajhossainmowmoni
 
Lt. 5 Pattern Reg.pptx
Lt. 5  Pattern Reg.pptxLt. 5  Pattern Reg.pptx
Lt. 5 Pattern Reg.pptx
ssuser5c580e1
 
Data mining technique (decision tree)
Data mining technique (decision tree)Data mining technique (decision tree)
Data mining technique (decision tree)
Shweta Ghate
 
Research trends in data warehousing and data mining
Research trends in data warehousing and data miningResearch trends in data warehousing and data mining
Research trends in data warehousing and data mining
Er. Nawaraj Bhandari
 
pattern recognition techniques and algo.pptx
pattern recognition techniques and algo.pptxpattern recognition techniques and algo.pptx
pattern recognition techniques and algo.pptx
Anilkamboj25
 
Practical Language Testing by Fulcher (2010)
Practical Language Testing by Fulcher (2010)Practical Language Testing by Fulcher (2010)
Practical Language Testing by Fulcher (2010)
Mahsa Farahanynia
 
A Decision Tree Based Classifier for Classification & Prediction of Diseases
A Decision Tree Based Classifier for Classification & Prediction of DiseasesA Decision Tree Based Classifier for Classification & Prediction of Diseases
A Decision Tree Based Classifier for Classification & Prediction of Diseases
ijsrd.com
 
Decision Tree Machine Learning Detailed Explanation.
Decision Tree Machine Learning Detailed Explanation.Decision Tree Machine Learning Detailed Explanation.
Decision Tree Machine Learning Detailed Explanation.
DrezzingGaming
 
Presentation on supervised learning
Presentation on supervised learningPresentation on supervised learning
Presentation on supervised learning
Tonmoy Bhagawati
 
Common Shortcomings in SE Experiments (ICSE'14 Doctoral Symposium Keynote)
Common Shortcomings in SE Experiments (ICSE'14 Doctoral Symposium Keynote)Common Shortcomings in SE Experiments (ICSE'14 Doctoral Symposium Keynote)
Common Shortcomings in SE Experiments (ICSE'14 Doctoral Symposium Keynote)
Natalia Juristo
 
Data mining chapter04and5-best
Data mining chapter04and5-bestData mining chapter04and5-best
Data mining chapter04and5-best
ABDUmomo
 
ensemble learning
ensemble learningensemble learning
ensemble learning
butest
 
Aligning tests to standards
Aligning tests to standardsAligning tests to standards
Aligning tests to standards
Fariba Chamani
 
icaisd presentation 2021.pptx
icaisd presentation 2021.pptxicaisd presentation 2021.pptx
icaisd presentation 2021.pptx
EllyMuningsih2
 
Mb0050 research methodology
Mb0050   research methodologyMb0050   research methodology
Mb0050 research methodology
smumbahelp
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
Kai Koenig
 
On the Measurement of Test Collection Reliability
On the Measurement of Test Collection ReliabilityOn the Measurement of Test Collection Reliability
On the Measurement of Test Collection Reliability
Julián Urbano
 
Ad

More from Mohd. Noor Abdul Hamid (14)

Chapter 5 case study
Chapter 5   case studyChapter 5   case study
Chapter 5 case study
Mohd. Noor Abdul Hamid
 
Chapter 4 common features of qualitative data analysis
Chapter 4 common features of qualitative data analysisChapter 4 common features of qualitative data analysis
Chapter 4 common features of qualitative data analysis
Mohd. Noor Abdul Hamid
 
Chapter 2 incorporating theory and conducting literature search and review
Chapter 2 incorporating theory and conducting literature search and reviewChapter 2 incorporating theory and conducting literature search and review
Chapter 2 incorporating theory and conducting literature search and review
Mohd. Noor Abdul Hamid
 
System and Information Quality
System and Information QualitySystem and Information Quality
System and Information Quality
Mohd. Noor Abdul Hamid
 
The process of making a decision
The process of making a decisionThe process of making a decision
The process of making a decision
Mohd. Noor Abdul Hamid
 
Introduction to DSS
Introduction to DSSIntroduction to DSS
Introduction to DSS
Mohd. Noor Abdul Hamid
 
Decision tree Using c4.5 Algorithm
Decision tree Using c4.5 AlgorithmDecision tree Using c4.5 Algorithm
Decision tree Using c4.5 Algorithm
Mohd. Noor Abdul Hamid
 
Types of functions and their domain & range - mohd noor
Types of functions and their domain & range - mohd noorTypes of functions and their domain & range - mohd noor
Types of functions and their domain & range - mohd noor
Mohd. Noor Abdul Hamid
 
Limit - Mohd Noor
Limit - Mohd NoorLimit - Mohd Noor
Limit - Mohd Noor
Mohd. Noor Abdul Hamid
 
Lagrange Multiplier - Mohd Moor
Lagrange Multiplier - Mohd MoorLagrange Multiplier - Mohd Moor
Lagrange Multiplier - Mohd Moor
Mohd. Noor Abdul Hamid
 
Introduction to Function, Domain and Range - Mohd Noor
Introduction to Function, Domain and Range - Mohd Noor Introduction to Function, Domain and Range - Mohd Noor
Introduction to Function, Domain and Range - Mohd Noor
Mohd. Noor Abdul Hamid
 
Introduction to Exponential Function - Mohd noor
Introduction to Exponential Function - Mohd noorIntroduction to Exponential Function - Mohd noor
Introduction to Exponential Function - Mohd noor
Mohd. Noor Abdul Hamid
 
Domain & Range (from graph) - Mohd Noor
Domain & Range (from graph) - Mohd NoorDomain & Range (from graph) - Mohd Noor
Domain & Range (from graph) - Mohd Noor
Mohd. Noor Abdul Hamid
 
Differentiation using First Principle - By Mohd Noor Abdul Hamid
Differentiation using First Principle  - By Mohd Noor Abdul HamidDifferentiation using First Principle  - By Mohd Noor Abdul Hamid
Differentiation using First Principle - By Mohd Noor Abdul Hamid
Mohd. Noor Abdul Hamid
 
Chapter 4 common features of qualitative data analysis
Chapter 4 common features of qualitative data analysisChapter 4 common features of qualitative data analysis
Chapter 4 common features of qualitative data analysis
Mohd. Noor Abdul Hamid
 
Chapter 2 incorporating theory and conducting literature search and review
Chapter 2 incorporating theory and conducting literature search and reviewChapter 2 incorporating theory and conducting literature search and review
Chapter 2 incorporating theory and conducting literature search and review
Mohd. Noor Abdul Hamid
 
Types of functions and their domain & range - mohd noor
Types of functions and their domain & range - mohd noorTypes of functions and their domain & range - mohd noor
Types of functions and their domain & range - mohd noor
Mohd. Noor Abdul Hamid
 
Introduction to Function, Domain and Range - Mohd Noor
Introduction to Function, Domain and Range - Mohd Noor Introduction to Function, Domain and Range - Mohd Noor
Introduction to Function, Domain and Range - Mohd Noor
Mohd. Noor Abdul Hamid
 
Introduction to Exponential Function - Mohd noor
Introduction to Exponential Function - Mohd noorIntroduction to Exponential Function - Mohd noor
Introduction to Exponential Function - Mohd noor
Mohd. Noor Abdul Hamid
 
Domain & Range (from graph) - Mohd Noor
Domain & Range (from graph) - Mohd NoorDomain & Range (from graph) - Mohd Noor
Domain & Range (from graph) - Mohd Noor
Mohd. Noor Abdul Hamid
 
Differentiation using First Principle - By Mohd Noor Abdul Hamid
Differentiation using First Principle  - By Mohd Noor Abdul HamidDifferentiation using First Principle  - By Mohd Noor Abdul Hamid
Differentiation using First Principle - By Mohd Noor Abdul Hamid
Mohd. Noor Abdul Hamid
 
Ad

Recently uploaded (20)

How to make impact with process mining? - PGGM
How to make impact with process mining? - PGGMHow to make impact with process mining? - PGGM
How to make impact with process mining? - PGGM
Process mining Evangelist
 
DATA ANALYST and Techniques in Kochi Explore cutting-edge analytical skills ...
DATA ANALYST  and Techniques in Kochi Explore cutting-edge analytical skills ...DATA ANALYST  and Techniques in Kochi Explore cutting-edge analytical skills ...
DATA ANALYST and Techniques in Kochi Explore cutting-edge analytical skills ...
aacj102006
 
Urban models for professional practice 03
Urban models for professional practice 03Urban models for professional practice 03
Urban models for professional practice 03
DanisseLoiDapdap
 
Unit 2 - Unified Modeling Language (UML).pdf
Unit 2 - Unified Modeling Language (UML).pdfUnit 2 - Unified Modeling Language (UML).pdf
Unit 2 - Unified Modeling Language (UML).pdf
sixokak391
 
HershAggregator (2).pdf musicretaildistribution
HershAggregator (2).pdf musicretaildistributionHershAggregator (2).pdf musicretaildistribution
HershAggregator (2).pdf musicretaildistribution
hershtara1
 
Responsible Data Science for Process Miners
Responsible Data Science for Process MinersResponsible Data Science for Process Miners
Responsible Data Science for Process Miners
Process mining Evangelist
 
Snowflake training | Snowflake online course
Snowflake training | Snowflake online courseSnowflake training | Snowflake online course
Snowflake training | Snowflake online course
Accentfuture
 
Mixed Methods Research.pptx education 201
Mixed Methods Research.pptx education 201Mixed Methods Research.pptx education 201
Mixed Methods Research.pptx education 201
GraceSolaa1
 
End to End Process Analysis - Cox Communications
End to End Process Analysis - Cox CommunicationsEnd to End Process Analysis - Cox Communications
End to End Process Analysis - Cox Communications
Process mining Evangelist
 
Storage Devices and the Mechanism of Data Storage in Audio and Visual Form
Storage Devices and the Mechanism of Data Storage in Audio and Visual FormStorage Devices and the Mechanism of Data Storage in Audio and Visual Form
Storage Devices and the Mechanism of Data Storage in Audio and Visual Form
Professional Content Writing's
 
national income & related aggregates (1)(1).pptx
national income & related aggregates (1)(1).pptxnational income & related aggregates (1)(1).pptx
national income & related aggregates (1)(1).pptx
j2492618
 
2022.02.07_Bahan DJE Energy Transition Dialogue 2022 kirim.pdf
2022.02.07_Bahan DJE Energy Transition Dialogue 2022 kirim.pdf2022.02.07_Bahan DJE Energy Transition Dialogue 2022 kirim.pdf
2022.02.07_Bahan DJE Energy Transition Dialogue 2022 kirim.pdf
RomiRomeo
 
lecture_13 tree in mmmmmmmm mmmmmfftro.pptx
lecture_13 tree in mmmmmmmm     mmmmmfftro.pptxlecture_13 tree in mmmmmmmm     mmmmmfftro.pptx
lecture_13 tree in mmmmmmmm mmmmmfftro.pptx
sarajafffri058
 
Time series analysis & forecasting-Day1.pptx
Time series analysis & forecasting-Day1.pptxTime series analysis & forecasting-Day1.pptx
Time series analysis & forecasting-Day1.pptx
AsmaaMahmoud89
 
Concrete_Presenbmlkvvbvvvfvbbbfcfftation.pptx
Concrete_Presenbmlkvvbvvvfvbbbfcfftation.pptxConcrete_Presenbmlkvvbvvvfvbbbfcfftation.pptx
Concrete_Presenbmlkvvbvvvfvbbbfcfftation.pptx
ssuserd1f4a3
 
2024 Digital Equity Accelerator Report.pdf
2024 Digital Equity Accelerator Report.pdf2024 Digital Equity Accelerator Report.pdf
2024 Digital Equity Accelerator Report.pdf
dominikamizerska1
 
presentacion.slideshare.informáticaJuridica..pptx
presentacion.slideshare.informáticaJuridica..pptxpresentacion.slideshare.informáticaJuridica..pptx
presentacion.slideshare.informáticaJuridica..pptx
GersonVillatoro4
 
report (maam dona subject).pptxhsgwiswhs
report (maam dona subject).pptxhsgwiswhsreport (maam dona subject).pptxhsgwiswhs
report (maam dona subject).pptxhsgwiswhs
AngelPinedaTaguinod
 
英国学位证(利物浦约翰摩尔斯大学本科毕业证)LJMU文凭证书办理
英国学位证(利物浦约翰摩尔斯大学本科毕业证)LJMU文凭证书办理英国学位证(利物浦约翰摩尔斯大学本科毕业证)LJMU文凭证书办理
英国学位证(利物浦约翰摩尔斯大学本科毕业证)LJMU文凭证书办理
Taqyea
 
TOAE201-Slides-Chapter 4. Sample theoretical basis (1).pdf
TOAE201-Slides-Chapter 4. Sample theoretical basis (1).pdfTOAE201-Slides-Chapter 4. Sample theoretical basis (1).pdf
TOAE201-Slides-Chapter 4. Sample theoretical basis (1).pdf
NhiV747372
 
How to make impact with process mining? - PGGM
How to make impact with process mining? - PGGMHow to make impact with process mining? - PGGM
How to make impact with process mining? - PGGM
Process mining Evangelist
 
DATA ANALYST and Techniques in Kochi Explore cutting-edge analytical skills ...
DATA ANALYST  and Techniques in Kochi Explore cutting-edge analytical skills ...DATA ANALYST  and Techniques in Kochi Explore cutting-edge analytical skills ...
DATA ANALYST and Techniques in Kochi Explore cutting-edge analytical skills ...
aacj102006
 
Urban models for professional practice 03
Urban models for professional practice 03Urban models for professional practice 03
Urban models for professional practice 03
DanisseLoiDapdap
 
Unit 2 - Unified Modeling Language (UML).pdf
Unit 2 - Unified Modeling Language (UML).pdfUnit 2 - Unified Modeling Language (UML).pdf
Unit 2 - Unified Modeling Language (UML).pdf
sixokak391
 
HershAggregator (2).pdf musicretaildistribution
HershAggregator (2).pdf musicretaildistributionHershAggregator (2).pdf musicretaildistribution
HershAggregator (2).pdf musicretaildistribution
hershtara1
 
Snowflake training | Snowflake online course
Snowflake training | Snowflake online courseSnowflake training | Snowflake online course
Snowflake training | Snowflake online course
Accentfuture
 
Mixed Methods Research.pptx education 201
Mixed Methods Research.pptx education 201Mixed Methods Research.pptx education 201
Mixed Methods Research.pptx education 201
GraceSolaa1
 
End to End Process Analysis - Cox Communications
End to End Process Analysis - Cox CommunicationsEnd to End Process Analysis - Cox Communications
End to End Process Analysis - Cox Communications
Process mining Evangelist
 
Storage Devices and the Mechanism of Data Storage in Audio and Visual Form
Storage Devices and the Mechanism of Data Storage in Audio and Visual FormStorage Devices and the Mechanism of Data Storage in Audio and Visual Form
Storage Devices and the Mechanism of Data Storage in Audio and Visual Form
Professional Content Writing's
 
national income & related aggregates (1)(1).pptx
national income & related aggregates (1)(1).pptxnational income & related aggregates (1)(1).pptx
national income & related aggregates (1)(1).pptx
j2492618
 
2022.02.07_Bahan DJE Energy Transition Dialogue 2022 kirim.pdf
2022.02.07_Bahan DJE Energy Transition Dialogue 2022 kirim.pdf2022.02.07_Bahan DJE Energy Transition Dialogue 2022 kirim.pdf
2022.02.07_Bahan DJE Energy Transition Dialogue 2022 kirim.pdf
RomiRomeo
 
lecture_13 tree in mmmmmmmm mmmmmfftro.pptx
lecture_13 tree in mmmmmmmm     mmmmmfftro.pptxlecture_13 tree in mmmmmmmm     mmmmmfftro.pptx
lecture_13 tree in mmmmmmmm mmmmmfftro.pptx
sarajafffri058
 
Time series analysis & forecasting-Day1.pptx
Time series analysis & forecasting-Day1.pptxTime series analysis & forecasting-Day1.pptx
Time series analysis & forecasting-Day1.pptx
AsmaaMahmoud89
 
Concrete_Presenbmlkvvbvvvfvbbbfcfftation.pptx
Concrete_Presenbmlkvvbvvvfvbbbfcfftation.pptxConcrete_Presenbmlkvvbvvvfvbbbfcfftation.pptx
Concrete_Presenbmlkvvbvvvfvbbbfcfftation.pptx
ssuserd1f4a3
 
2024 Digital Equity Accelerator Report.pdf
2024 Digital Equity Accelerator Report.pdf2024 Digital Equity Accelerator Report.pdf
2024 Digital Equity Accelerator Report.pdf
dominikamizerska1
 
presentacion.slideshare.informáticaJuridica..pptx
presentacion.slideshare.informáticaJuridica..pptxpresentacion.slideshare.informáticaJuridica..pptx
presentacion.slideshare.informáticaJuridica..pptx
GersonVillatoro4
 
report (maam dona subject).pptxhsgwiswhs
report (maam dona subject).pptxhsgwiswhsreport (maam dona subject).pptxhsgwiswhs
report (maam dona subject).pptxhsgwiswhs
AngelPinedaTaguinod
 
英国学位证(利物浦约翰摩尔斯大学本科毕业证)LJMU文凭证书办理
英国学位证(利物浦约翰摩尔斯大学本科毕业证)LJMU文凭证书办理英国学位证(利物浦约翰摩尔斯大学本科毕业证)LJMU文凭证书办理
英国学位证(利物浦约翰摩尔斯大学本科毕业证)LJMU文凭证书办理
Taqyea
 
TOAE201-Slides-Chapter 4. Sample theoretical basis (1).pdf
TOAE201-Slides-Chapter 4. Sample theoretical basis (1).pdfTOAE201-Slides-Chapter 4. Sample theoretical basis (1).pdf
TOAE201-Slides-Chapter 4. Sample theoretical basis (1).pdf
NhiV747372
 

Classification Using Decision tree

  • 1. By : Mohd. Noor Abdul Hamid, Ph.D (Universiti Utara Malaysia)
  • 2. Introduction to Classification Classification  the task of assigning objects to one of several predefined categories or class. Given a collection of records (training set ) Each record contains a set of attributes, one of the attributes is the class. Find a model for class attribute as a function of the values of other attributes. Goal: previously unseen records should be assigned a class as accurately as possible. A test set is used to determine the accuracy of the model. Usually, the given data set is divided into training and test sets, with training set used to build the model and test set used to validate it. By : Mohd. Noor Abdul Hamid, Ph.D (Universiti Utara Malaysia)
  • 3. By : Mohd. Noor Abdul Hamid (UUM) Apply Model Induction Deduction Learn Model Model Tid Attrib1 Attrib2 Attrib3 Class 1 Yes Large 125K No 2 No Medium 100K No 3 No Small 70K No 4 Yes Medium 120K No 5 No Large 95K Yes 6 No Medium 60K No 7 Yes Large 220K No 8 No Small 85K Yes 9 No Medium 75K No 10 No Small 90K Yes 10 Tid Attrib1 Attrib2 Attrib3 Class 11 No Small 55K ? 12 Yes Medium 80K ? 13 Yes Large 110K ? 14 No Small 95K ? 15 No Large 67K ? 10 Test Set Learning algorithm Training Set Introduction to Classification
  • 4. Introduction to Classification Example of Classification techniques: - Decision Tree - Neural Network - Rule-Based - naïve Bayes Classifier, etc. Classification techniques are most suited for predicting data sets with binary or nominal categories. They are less effective for ordinal categories since they do not consider the implicit order among the categories. By : Mohd. Noor Abdul Hamid, Ph.D (Universiti Utara Malaysia)
  • 5. Introduction to Classification Performance of a Classification model is evaluated based on the counts of test records correctly and incorrectly predicted by the model  Confusion Matrix Figure : Confusion matrix for a 2-class problem Based on the entries of the confusion matrix, the total number of : - correct prediction made by the model is (f11 + f00) - incorrect predictions is(f01 + f10). Predicted Class Class = 1 Class = 0 Actual Class Class = 1 f11 f10 Class = 0 f01 f00 By : Mohd. Noor Abdul Hamid, Ph.D (Universiti Utara Malaysia)
  • 6. Introduction to Classification Therefore we can evaluate the performance of Classification model by looking at the accuracy of the model to make prediction. Equivalently, the performance of a model can be expressed in terms of its error rate: Accuracy = Error Rate = 00011011 0110 ffff ff spredictionofnumberTotal spredictionwrongofNumber    00011011 0011 ffff ff spredictionofnumberTotal spredictioncorrectofNumber    By : Mohd. Noor Abdul Hamid, Ph.D (Universiti Utara Malaysia)
  • 7. • Predicting tumor cells as benign or malignant • Classifying credit card transactions as legitimate or fraudulent • Classifying secondary structures of protein as alpha-helix, beta-sheet, or random coil • Categorizing news stories as finance, weather, entertainment, sports, etc Example of Classification Task By : Mohd. Noor Abdul Hamid, Ph.D (Universiti Utara Malaysia)
  • 8. By : Mohd. Noor Abdul Hamid, Ph.D (Universiti Utara Malaysia)
  • 9. What is a Decision Tree? A decision tree is a structure that can be used to divide up a large collection of records into successfully smaller sets of records by applying a sequence of simple decision rules. With each successive division, the members of the resulting sets become more and more similar to each other. A decision tree model consists of a set of rules for dividing a large heterogeneous population into smaller, more homogeneous (mutually exclusive) groups with respect to a particular target. Hence, the algorithm used to construct decision tree is referred to as recursive partitioning. By : Mohd. Noor Abdul Hamid, Ph.D (Universiti Utara Malaysia)
  • 10. What is a Decision Tree? The target variable is usually categorical and the decision tree is used either to: Calculate the probability that a given record belong to each of the category or, To classify the record by assigning it to the most likely class (or category). Note : Decision tree can also be used to estimate the value of a continuous target variable. However, regression models and neural network are generally more appropriate for estimation. By : Mohd. Noor Abdul Hamid, Ph.D (Universiti Utara Malaysia)
  • 11. What is a Decision Tree? Decision Tree has three types of nodes: Root Node : top (or left-most) node with no incoming edges and zero or more outgoing edges. Child or Internal Node : descendent node which has exactly one incoming edge and two or more outgoing edges. Leaf Node : terminal node which has exactly one incoming edge and no outgoing edges. In Decision Tree, each leaf node is assigned a class label. The rules or branches are the unique path (edges) with a set of conditions (attribute) that divide the observations into smaller subset. By : Mohd. Noor Abdul Hamid, Ph.D (Universiti Utara Malaysia)
  • 12. Decision Tree Diagram Gender Height Height Short Tall Short Medium Tall Female Male <1.3m >1.8m <1.5m > 2.0m ROOT NODE INTERNAL NODE BRANCH Medium LEAF NODE By : Mohd. Noor Abdul Hamid, Ph.D (Universiti Utara Malaysia)
  • 13. Types of Decision Tree Balanced Tree Bushy Tree Deep Tree By : Mohd. Noor Abdul Hamid, Ph.D (Universiti Utara Malaysia)
  • 14. How to Build Decision Tree? Generally, building a decision tree involved 2 steps: Tree construction  recursively split the tree according to selected attributes (conditions), Tree pruning  identify and remove the irrelevance branches (that might lead to outliers) – to increase classification accuracy. Pruning By : Mohd. Noor Abdul Hamid, Ph.D (Universiti Utara Malaysia)
  • 15. By : Mohd. Noor Abdul Hamid (UUM) How to Build Decision Tree? In principle, there are exponentially many decision tree that can be construct from a given set of attributes  finding the optimal tree is computationally infeasible because of the exponential size of the search space. Efficient algorithms has been develop to induce reasonably accurate, albeit suboptimal, decision tree in a reasonable amount of time. These algorithm usually employ a greedy strategy  making a series of locally optimal decisions about which attribute to use for partitioning the data. One such algorithm is Hunt’s Algorithm – which is the basis of many existing decision tree algorithm including ID3, C4.5 and CART.
  • 16. By : Mohd. Noor Abdul Hamid (UUM) Hunt’s Algorithm Let Dt be the set of training records that are associated with node t and y = {y1, y2,…yc}, where y is the target variable with c number of classes. The following is a recursive definition of Hunt’s algorithm: Step 1 : If all the records in Dt belong to the same class yt, then node t is a leaf node labeled as yt. Step 2 : If Dt contains records that belong to more than one class, an attribute test condition is selected to partition the records into smaller subsets. A child node is created for each outcome of the test condition and the records in Dt are distributed to the children based on the outcomes. The algorithm is then recursively applied to each child node.
  • 17. By : Mohd. Noor Abdul Hamid (UUM) Example of a Decision Tree Tid Refund Marital Status Taxable Income Cheat 1 Yes Single 125K No 2 No Married 100K No 3 No Single 70K No 4 Yes Married 120K No 5 No Divorced 95K Yes 6 No Married 60K No 7 Yes Divorced 220K No 8 No Single 85K Yes 9 No Married 75K No 10 No Single 90K Yes 10 Refund MarSt TaxInc YESNO NO NO Yes No MarriedSingle, Divorced < 80K > 80K Splitting Attributes Training Data Model: Decision Tree
  • 18. By : Mohd. Noor Abdul Hamid (UUM) Another Example of Decision Tree Tid Refund Marital Status Taxable Income Cheat 1 Yes Single 125K No 2 No Married 100K No 3 No Single 70K No 4 Yes Married 120K No 5 No Divorced 95K Yes 6 No Married 60K No 7 Yes Divorced 220K No 8 No Single 85K Yes 9 No Married 75K No 10 No Single 90K Yes 10 MarSt Refund TaxInc YESNO NO NO Yes No Married Single, Divorced < 80K > 80K There could be more than one tree that fits the same data!
  • 19. By : Mohd. Noor Abdul Hamid (UUM) Apply Model to Test Data Refund MarSt TaxInc YESNO NO NO Yes No MarriedSingle, Divorced < 80K > 80K Refund Marital Status Taxable Income Cheat No Married 80K ? 10 Test Data Start from the root of tree.
  • 20. By : Mohd. Noor Abdul Hamid (UUM) Apply Model to Test Data Refund MarSt TaxInc YESNO NO NO Yes No MarriedSingle, Divorced < 80K > 80K Refund Marital Status Taxable Income Cheat No Married 80K ? 10 Test Data
  • 21. By : Mohd. Noor Abdul Hamid (UUM) Apply Model to Test Data Refund MarSt TaxInc YESNO NO NO Yes No MarriedSingle, Divorced < 80K > 80K Refund Marital Status Taxable Income Cheat No Married 80K ? 10 Test Data
  • 22. By : Mohd. Noor Abdul Hamid (UUM) Apply Model to Test Data Refund MarSt TaxInc YESNO NO NO Yes No MarriedSingle, Divorced < 80K > 80K Refund Marital Status Taxable Income Cheat No Married 80K ? 10 Test Data
  • 23. By : Mohd. Noor Abdul Hamid (UUM) Apply Model to Test Data Refund MarSt TaxInc YESNO NO NO Yes No MarriedSingle, Divorced < 80K > 80K Refund Marital Status Taxable Income Cheat No Married 80K ? 10 Test Data
  • 24. By : Mohd. Noor Abdul Hamid (UUM) Apply Model to Test Data Refund MarSt TaxInc YESNO NO NO Yes No MarriedSingle, Divorced < 80K > 80K Refund Marital Status Taxable Income Cheat No Married 80K ? 10 Test Data Assign Cheat to “No”
  • 25. By : Mohd. Noor Abdul Hamid (UUM) Design Issue of Decision Tree Induction How should the training records be split? Which attribute test condition works better to classify the records? What is the objective measures for evaluating the goodness of each test condition? How should the splitting procedure stop? What is the condition to stop splitting the records? One strategy is to continue expanding a node until all the records belong to the same class or all the records have identical attributes values. Other criteria can also be imposed to allow the tree-growing procedure to terminate earlier.
  • 26. By : Mohd. Noor Abdul Hamid (UUM) Methods for Expressing Attribute Test Condition a) Binary Attributes  generates two possible outcomes (binary split) GENDER Male Female
  • 27. By : Mohd. Noor Abdul Hamid (UUM) Methods for Expressing Attribute Test Condition b) Nominal Attributes : Multiway split Marital Status Single Divorced Married
  • 28. By : Mohd. Noor Abdul Hamid (UUM) Methods for Expressing Attribute Test Condition b) Nominal Attributes : Binary split (eg : in CART) Marital Status {Single} { Married, Divorced} Marital Status {Married} { Single, Divorced} Marital Status {Divorced} { Married, Single } OR
  • 29. By : Mohd. Noor Abdul Hamid (UUM) Methods for Expressing Attribute Test Condition c) Ordinal Attributes : Multiway split Shirt Size Small Large Extra LargeMedium
  • 30. By : Mohd. Noor Abdul Hamid (UUM) Methods for Expressing Attribute Test Condition c) Ordinal Attributes : Binary split – as long as it does not violate the order property of the attribute values. Shirt Size {S, M} { L, XL} Marital Status {S} { M, L, XL } Marital Status {S, L } { M, XL } OR
  • 31. By : Mohd. Noor Abdul Hamid (UUM) Methods for Expressing Attribute Test Condition d) Continuous Attributes  Binary split Annual Income > 80K Yes No
  • 32. By : Mohd. Noor Abdul Hamid (UUM) Methods for Expressing Attribute Test Condition d) Continuous Attributes : Multiway split Annual Income <10K 50K, 80K} >80k{10K, 25K} {25K, 50K}
  • 33. By : Mohd. Noor Abdul Hamid (UUM) Measures for selecting the Best Split Let p(i|t) denote the fraction of records belonging to class i at a given node t. In a two-class problem, the class distribution at any node can be written as (p0, p1), where p1 = 1 – p0 The measure developed for selecting the best split are often based on the degree of impurity of the child class distribution (0,1) has zero impurity, whereas a node with uniform class distribution (0.5, 0.5) has the highest impurity. Example of impurity measures include: Entropy Gini Classification error Categorical Target Information Gain Ratio Chi Square Test Variance Reduction F-Test Interval Target
  • 34. By : Mohd. Noor Abdul Hamid (UUM) Measures of Impurity (I)     1 0 2 )|(log)|()( c i tiptiptEntropy     1 0 2 )]|([1)( c i tiptGini )]|([max1)( tipterrortionClassifica i  Where c is the number of classes and 0log20 = 0 in the entropy calculations
  • 35. By : Mohd. Noor Abdul Hamid (UUM) Example : Measures of Impurity Parent Node Count Class = 0 4 Class = 1 14 Node N1 Count Class = 0 0 Class = 1 6 Node N2 Count Class = 0 1 Class = 1 5 Node N3 Count Class = 0 3 Class = 1 3 Node N1: 0)]6/6(),6/0max[(1 0)6/6(log)6/6()6/0(log)6/0( 0)6/6()6/0(1 22 22    Error Entropy Gini
  • 36. By : Mohd. Noor Abdul Hamid (UUM) Example : Measures of Impurity Parent Node Count Class = 0 4 Class = 1 14 Node N1 Count Class = 0 0 Class = 1 6 Node N2 Count Class = 0 1 Class = 1 5 Node N3 Count Class = 0 3 Class = 1 3 Node N2: 167.0)]6/5(),6/1max[(1 65.0)6/5(log)6/5()6/1(log)6/1( 278.0)6/5()6/1(1 22 22    Error Entropy Gini
  • 37. By : Mohd. Noor Abdul Hamid (UUM) Example : Measures of Impurity Parent Node Count Class = 0 4 Class = 1 14 Node N1 Count Class = 0 0 Class = 1 6 Node N2 Count Class = 0 1 Class = 1 5 Node N3 Count Class = 0 3 Class = 1 3 Node N3: 5.0)]6/3(),6/3max[(1 1)6/3(log)6/3()6/3(log)6/3( 5.0)6/3()6/3(1 22 22    Error Entropy Gini
  • 38. By : Mohd. Noor Abdul Hamid (UUM) Example : Measures of Impurity Parent Node Count Class = 0 4 Class = 1 14 Node N1 Count Class = 0 0 Class = 1 6 Gini 0 Entropy 0 Error 0 Node N2 Count Class = 0 1 Class = 1 5 Gini 0.278 Entropy 0.650 Error 0.167 Node N3 Count Class = 0 3 Class = 1 3 Gini 0.5 Entropy 1 Error 0.5 N1 has the lowest impurity value, followed by N2 and N3
  • 39. By : Mohd. Noor Abdul Hamid (UUM) Measures of Impurity To determine how well a test condition performs, we need to compare the degree of impurity of the parent node (before splitting) and the child node (after splitting). The larger the different, the better the test condition. The gain ∆, is a criterion that can be used to determine the goodness of a split. Where: I(.) is the impurity measure of a given node N is the total number of records at the parent node k is the number of attributes value (class) N(vj) is the number of records associated with the child node vj. )( )( )( 1 j k j j vI N vN parentI   Weighted Average Impurity
  • 40. By : Mohd. Noor Abdul Hamid (UUM) Measures of Impurity : Info. Gain Ratio Since I(parent) is the same for all test condition, maximizing the gain is equivalent to minimizing the weighted average impurity measure of the child nodes. When entropy is used as the impurity measure, the difference in entropy is known as the Information Gain Ratio (IGR) Decision tree build using entropy tend to be quite bushy. Bushy tree with many multi-way split are undesirable as these splits lead to small numbers of records in each node.
  • 41. By : Mohd. Noor Abdul Hamid (UUM) Splitting Binary Attributes (using Gini) Example : Suppose there are two ways (A and B) to split the data into smaller subset. N2 C0 2 C1 3 N1 C0 1 C1 4 N1 C0 4 C1 3 N2 C0 5 C1 2 Parent C0 6 C1 6 Gini = 0.5 A B Which one is a better split?? Compute the weighted average of the Gini index of both attribute Gini : 1 –(6/12)2 – (6/12)2 = 0.5 Gini Index: 0.4898 Gini Index: 0.480 Gini Index: 0.4082 Gini Index: 0.320
  • 42. By : Mohd. Noor Abdul Hamid (UUM) Splitting Binary Attributes (using Gini) Example : N2 C0 2 C1 3 N1 C0 1 C1 4 N1 C0 4 C1 3 N2 C0 5 C1 2 A B Gini Index: 0.4898 Gini Index: 0.480 Gini Index: 0.4082 Gini Index: 0.320 Weighted Average of Gini Index: [(7/12) x 0.4898] + [(5/12) x 0.480] = 0.486 Gain, ∆ = 0.5 - 0.486 = 0.014 Weighted Average of Gini Index: [(5/12) x 0.320] + [(7/12) x 0.4082] = 0.3715 Gain, ∆ = 0.5 - 0.3715 = 0.1285 Therefore, B is preferred 
  • 43. By : Mohd. Noor Abdul Hamid (UUM) Splitting Nominal Attributes (using Gini) Example : Which split is better? Binary or Multi-way splits. Family C0 1 C1 3 Sports C0 8 C1 0 Sports, Luxury C0 9 C1 7 Family, Luxury C0 2 C1 10 Car Type Car Type Car Type Family C0 1 C1 3 Sports C0 8 C1 0 Luxury C0 1 C1 7 Weighted Average Gini = 0.468 Weighted Average Gini = 0.167 Weighted Average Gini = 0.163 
  • 44. By : Mohd. Noor Abdul Hamid (UUM) Splitting Continuous Attributes (using Gini) A brute-force method is used to find the best split position (v) for a continuous attribute (eg: Annual Income). To reduce complexity, the training records are sorted based on the annual income. Class No No No Yes Yes Yes No No No No Annual Income (sorted) 60 70 75 85 90 95 100 120 125 220 Candidate split positions (v) are identified by taking the midpoints between two adjacent sorted values. 0716253443434343526170No 0303030303122130303030Yes >≤>≤>≤>≤>≤>≤>≤>≤>≤>≤>≤ 23017212211097928780726555Split position (mid points)
  • 45. By : Mohd. Noor Abdul Hamid (UUM) Splitting Continuous Attributes (using Gini) A brute-force method is used to find the best split position (v) for a continuous attribute (eg: Annual Income). We then compute the Gini index for each candidate and choose the one that gives the lowest value. Class No No No Yes Yes Yes No No No No Annual Income (sorted) 60 70 75 85 90 95 100 120 125 220 0716253443434343526170No 0303030303122130303030Yes >≤>≤>≤>≤>≤>≤>≤>≤>≤>≤>≤ 23017212211097928780726555Split position (mid points) Gini 0.420 0.400 0.375 0.343 0.417 0.400 0.300 0.343 0.375 0.4 0.420 
  翻译: