SlideShare a Scribd company logo
Data Mining:
Classification
Classification and Prediction
• What is classification? What is prediction?
• Issues regarding classification and prediction
• Classification by decision tree induction
• Classification:
– predicts categorical class labels
– classifies data (constructs a model) based on the
training set and the values (class labels) in a
classifying attribute and uses it in classifying new data
• Prediction:
– models continuous-valued functions, i.e., predicts
unknown or missing values
• Typical Applications
– credit approval
– target marketing
– medical diagnosis
– treatment effectiveness analysis
Classification vs. Prediction
Classification Learning: Definition
l Given a collection of records (training set)
– Each record contains a set of attributes, one of the
attributes is the class
l Find a model for the class attribute as a function
of the values of the other attributes
l Goal: previously unseen records should be
assigned a class as accurately as possible
– Use test set to estimate the accuracy of the model
– Often, the given data set is divided into training and test
sets, with training set used to build the model and test
set used to validate it.
Illustrating Classification Learning
Apply
Model
Induction
Deduction
Learn
Model
Model
Tid Attrib1 Attrib2 Attrib3 Class
1 Yes Large 125K No
2 No Medium 100K No
3 No Small 70K No
4 Yes Medium 120K No
5 No Large 95K Yes
6 No Medium 60K No
7 Yes Large 220K No
8 No Small 85K Yes
9 No Medium 75K No
10 No Small 90K Yes
10
Tid Attrib1 Attrib2 Attrib3 Class
11 No Small 55K ?
12 Yes Medium 80K ?
13 Yes Large 110K ?
14 No Small 95K ?
15 No Large 67K ?
10
Test Set
Learning
algorithm
Training Set
Examples of Classification Task
l Predicting tumor cells as benign or malignant
l Classifying credit card transactions
as legitimate or fraudulent
l Classifying secondary structures of protein
as alpha-helix, beta-sheet, or random
coil
l Categorizing news stories as finance,
weather, entertainment, sports, etc.
Classification - A Two-Step Process
• Model construction: describing a set of predetermined classes
– Building the Classifier or Model
– Each tuple/sample is assumed to belong to a predefined class, as
determined by the class label attribute
– The set of tuples used for model construction: training set
– The model is represented as classification rules, decision trees, or
mathematical formulae
• Model usage: for classifying future or unknown objects
– Using Classifier for Classification
– Estimate accuracy of the model
• The known label of test sample is compared with the classified result
from the model
• Accuracy rate is the percentage of test set samples that are correctly
classified by the model
• Test set is independent of training set, otherwise over-fitting will
occur
Classification Process (1): Model Construction
Example: Loan application
The data classification process: Learning: Training data are analyzed by a
classification algorithm. Here, the class label attribute is loan decision, and the
learned model or classifier is represented in the form of classification rules.
Classification Process (2): Use the Model
in Prediction
Classification: Test data are used to estimate the accuracy of the classification
rules. If the accuracy is considered acceptable, the rules can be applied to the
classification of new data tuples.
Supervised vs. Unsupervised
Learning
• Supervised learning (classification)
– Supervision: The training data (observations,
measurements, etc.) are accompanied by labels
indicating the class of the observations
– New data is classified based on the training set
• Unsupervised learning (clustering)
– The class labels of training data is unknown
– Given a set of measurements, observations, etc. with
the aim of establishing the existence of classes or
clusters in the data
Issues (1): Data Preparation
• Data cleaning
– Preprocess data in order to reduce noise and handle
missing values
• Relevance analysis (feature selection)
– Remove the irrelevant or redundant attributes
• Data transformation
– Generalize and/or normalize data
Issues (2): Evaluating Classification
Methods
• Predictive accuracy
• Speed and scalability
– time to construct the model
– time to use the model
• Robustness
– handling noise and missing values
• Scalability
– efficiency in disk-resident databases
• Interpretability:
– understanding and insight provded by the model
• Goodness of rules
– decision tree size
– compactness of classification rules
The problem
• Given a set of training cases/objects and their attribute
values, try to determine the target attribute value of new
examples.
– Classification
– Prediction
• Use a decision tree to predict categories for new events.
• Use training data to build the decision tree.
New
Events
Decision
Tree
Category
Training
Events and
Categories
Ad

More Related Content

What's hot (20)

Kohonen self organizing maps
Kohonen self organizing mapsKohonen self organizing maps
Kohonen self organizing maps
raphaelkiminya
 
Artificial Intelligence Notes Unit 2
Artificial Intelligence Notes Unit 2Artificial Intelligence Notes Unit 2
Artificial Intelligence Notes Unit 2
DigiGurukul
 
Data integration
Data integrationData integration
Data integration
kalavathisugan
 
Classification and Regression
Classification and RegressionClassification and Regression
Classification and Regression
Megha Sharma
 
Fuzzy expert system
Fuzzy expert systemFuzzy expert system
Fuzzy expert system
Hsuvas Borkakoty
 
Inference in First-Order Logic
Inference in First-Order Logic Inference in First-Order Logic
Inference in First-Order Logic
Junya Tanaka
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
Eng Teong Cheah
 
Unit 1 - ML - Introduction to Machine Learning.pptx
Unit 1 - ML - Introduction to Machine Learning.pptxUnit 1 - ML - Introduction to Machine Learning.pptx
Unit 1 - ML - Introduction to Machine Learning.pptx
jawad184956
 
Clustering
ClusteringClustering
Clustering
Dr. C.V. Suresh Babu
 
Exploratory data analysis with Python
Exploratory data analysis with PythonExploratory data analysis with Python
Exploratory data analysis with Python
Davis David
 
Machine Learning presentation.
Machine Learning presentation.Machine Learning presentation.
Machine Learning presentation.
butest
 
14. Query Optimization in DBMS
14. Query Optimization in DBMS14. Query Optimization in DBMS
14. Query Optimization in DBMS
koolkampus
 
Information Retrieval Models
Information Retrieval ModelsInformation Retrieval Models
Information Retrieval Models
Nisha Arankandath
 
Knowledge Representation & Reasoning
Knowledge Representation & ReasoningKnowledge Representation & Reasoning
Knowledge Representation & Reasoning
Sajid Marwat
 
Decision tree
Decision treeDecision tree
Decision tree
ShraddhaPandey45
 
Decision Trees
Decision TreesDecision Trees
Decision Trees
Student
 
Decision tree
Decision treeDecision tree
Decision tree
shivani saluja
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
Girish Khanzode
 
CS8091_BDA_Unit_I_Analytical_Architecture
CS8091_BDA_Unit_I_Analytical_ArchitectureCS8091_BDA_Unit_I_Analytical_Architecture
CS8091_BDA_Unit_I_Analytical_Architecture
Palani Kumar
 
2.4 rule based classification
2.4 rule based classification2.4 rule based classification
2.4 rule based classification
Krish_ver2
 
Kohonen self organizing maps
Kohonen self organizing mapsKohonen self organizing maps
Kohonen self organizing maps
raphaelkiminya
 
Artificial Intelligence Notes Unit 2
Artificial Intelligence Notes Unit 2Artificial Intelligence Notes Unit 2
Artificial Intelligence Notes Unit 2
DigiGurukul
 
Classification and Regression
Classification and RegressionClassification and Regression
Classification and Regression
Megha Sharma
 
Inference in First-Order Logic
Inference in First-Order Logic Inference in First-Order Logic
Inference in First-Order Logic
Junya Tanaka
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
Eng Teong Cheah
 
Unit 1 - ML - Introduction to Machine Learning.pptx
Unit 1 - ML - Introduction to Machine Learning.pptxUnit 1 - ML - Introduction to Machine Learning.pptx
Unit 1 - ML - Introduction to Machine Learning.pptx
jawad184956
 
Exploratory data analysis with Python
Exploratory data analysis with PythonExploratory data analysis with Python
Exploratory data analysis with Python
Davis David
 
Machine Learning presentation.
Machine Learning presentation.Machine Learning presentation.
Machine Learning presentation.
butest
 
14. Query Optimization in DBMS
14. Query Optimization in DBMS14. Query Optimization in DBMS
14. Query Optimization in DBMS
koolkampus
 
Information Retrieval Models
Information Retrieval ModelsInformation Retrieval Models
Information Retrieval Models
Nisha Arankandath
 
Knowledge Representation & Reasoning
Knowledge Representation & ReasoningKnowledge Representation & Reasoning
Knowledge Representation & Reasoning
Sajid Marwat
 
Decision Trees
Decision TreesDecision Trees
Decision Trees
Student
 
CS8091_BDA_Unit_I_Analytical_Architecture
CS8091_BDA_Unit_I_Analytical_ArchitectureCS8091_BDA_Unit_I_Analytical_Architecture
CS8091_BDA_Unit_I_Analytical_Architecture
Palani Kumar
 
2.4 rule based classification
2.4 rule based classification2.4 rule based classification
2.4 rule based classification
Krish_ver2
 

Similar to Lect8 Classification & prediction (20)

Data mining chapter04and5-best
Data mining chapter04and5-bestData mining chapter04and5-best
Data mining chapter04and5-best
ABDUmomo
 
Machine learning algorithms for data mining
Machine learning algorithms for data miningMachine learning algorithms for data mining
Machine learning algorithms for data mining
Ashikur Rahman
 
Chapter 4 Classification in data sience .pdf
Chapter 4 Classification in data sience .pdfChapter 4 Classification in data sience .pdf
Chapter 4 Classification in data sience .pdf
AschalewAyele2
 
Classification and Prediction.pptx
Classification and Prediction.pptxClassification and Prediction.pptx
Classification and Prediction.pptx
SandeepAgrawal84
 
Classification
ClassificationClassification
Classification
thamizh arasi
 
introducatio to ml introducatio to ml introducatio to ml
introducatio to ml introducatio to ml introducatio to mlintroducatio to ml introducatio to ml introducatio to ml
introducatio to ml introducatio to ml introducatio to ml
DecentMusicians
 
dataminingclassificationprediction123 .pptx
dataminingclassificationprediction123 .pptxdataminingclassificationprediction123 .pptx
dataminingclassificationprediction123 .pptx
AsrithaKorupolu
 
2.1 Data Mining-classification Basic concepts
2.1 Data Mining-classification Basic concepts2.1 Data Mining-classification Basic concepts
2.1 Data Mining-classification Basic concepts
Krish_ver2
 
Data mining techniques unit iv
Data mining techniques unit ivData mining techniques unit iv
Data mining techniques unit iv
malathieswaran29
 
Classification techniques in data mining
Classification techniques in data miningClassification techniques in data mining
Classification techniques in data mining
Kamal Acharya
 
classification in data mining and data warehousing.pdf
classification in data mining and data warehousing.pdfclassification in data mining and data warehousing.pdf
classification in data mining and data warehousing.pdf
321106410027
 
Unit - III Classification wjwjdbekwjwbdbekwk
Unit - III Classification wjwjdbekwjwbdbekwkUnit - III Classification wjwjdbekwjwbdbekwk
Unit - III Classification wjwjdbekwjwbdbekwk
mailmuzammil871
 
Machine learning
Machine learningMachine learning
Machine learning
deepakbagam
 
Lecturer3 by RamaKrishna SRU waranagal telanga
Lecturer3 by RamaKrishna SRU waranagal telangaLecturer3 by RamaKrishna SRU waranagal telanga
Lecturer3 by RamaKrishna SRU waranagal telanga
coolscools1231
 
CS 402 DATAMINING AND WAREHOUSING -MODULE 3
CS 402 DATAMINING AND WAREHOUSING -MODULE 3CS 402 DATAMINING AND WAREHOUSING -MODULE 3
CS 402 DATAMINING AND WAREHOUSING -MODULE 3
NIMMYRAJU
 
Unit-3 ML Modelling and Evaluation .pptx
Unit-3 ML Modelling and Evaluation .pptxUnit-3 ML Modelling and Evaluation .pptx
Unit-3 ML Modelling and Evaluation .pptx
Divya573916
 
Supply Chain Analytics, Supply Chain Management, Supply Chain Data Analytics
Supply Chain Analytics, Supply Chain Management, Supply Chain Data AnalyticsSupply Chain Analytics, Supply Chain Management, Supply Chain Data Analytics
Supply Chain Analytics, Supply Chain Management, Supply Chain Data Analytics
MujtabaAliKhan12
 
Statistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptxStatistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptx
rajalakshmi5921
 
ai4.ppt
ai4.pptai4.ppt
ai4.ppt
atul404633
 
++16516.pptx
++16516.pptx++16516.pptx
++16516.pptx
CoolDude357501
 
Data mining chapter04and5-best
Data mining chapter04and5-bestData mining chapter04and5-best
Data mining chapter04and5-best
ABDUmomo
 
Machine learning algorithms for data mining
Machine learning algorithms for data miningMachine learning algorithms for data mining
Machine learning algorithms for data mining
Ashikur Rahman
 
Chapter 4 Classification in data sience .pdf
Chapter 4 Classification in data sience .pdfChapter 4 Classification in data sience .pdf
Chapter 4 Classification in data sience .pdf
AschalewAyele2
 
Classification and Prediction.pptx
Classification and Prediction.pptxClassification and Prediction.pptx
Classification and Prediction.pptx
SandeepAgrawal84
 
introducatio to ml introducatio to ml introducatio to ml
introducatio to ml introducatio to ml introducatio to mlintroducatio to ml introducatio to ml introducatio to ml
introducatio to ml introducatio to ml introducatio to ml
DecentMusicians
 
dataminingclassificationprediction123 .pptx
dataminingclassificationprediction123 .pptxdataminingclassificationprediction123 .pptx
dataminingclassificationprediction123 .pptx
AsrithaKorupolu
 
2.1 Data Mining-classification Basic concepts
2.1 Data Mining-classification Basic concepts2.1 Data Mining-classification Basic concepts
2.1 Data Mining-classification Basic concepts
Krish_ver2
 
Data mining techniques unit iv
Data mining techniques unit ivData mining techniques unit iv
Data mining techniques unit iv
malathieswaran29
 
Classification techniques in data mining
Classification techniques in data miningClassification techniques in data mining
Classification techniques in data mining
Kamal Acharya
 
classification in data mining and data warehousing.pdf
classification in data mining and data warehousing.pdfclassification in data mining and data warehousing.pdf
classification in data mining and data warehousing.pdf
321106410027
 
Unit - III Classification wjwjdbekwjwbdbekwk
Unit - III Classification wjwjdbekwjwbdbekwkUnit - III Classification wjwjdbekwjwbdbekwk
Unit - III Classification wjwjdbekwjwbdbekwk
mailmuzammil871
 
Machine learning
Machine learningMachine learning
Machine learning
deepakbagam
 
Lecturer3 by RamaKrishna SRU waranagal telanga
Lecturer3 by RamaKrishna SRU waranagal telangaLecturer3 by RamaKrishna SRU waranagal telanga
Lecturer3 by RamaKrishna SRU waranagal telanga
coolscools1231
 
CS 402 DATAMINING AND WAREHOUSING -MODULE 3
CS 402 DATAMINING AND WAREHOUSING -MODULE 3CS 402 DATAMINING AND WAREHOUSING -MODULE 3
CS 402 DATAMINING AND WAREHOUSING -MODULE 3
NIMMYRAJU
 
Unit-3 ML Modelling and Evaluation .pptx
Unit-3 ML Modelling and Evaluation .pptxUnit-3 ML Modelling and Evaluation .pptx
Unit-3 ML Modelling and Evaluation .pptx
Divya573916
 
Supply Chain Analytics, Supply Chain Management, Supply Chain Data Analytics
Supply Chain Analytics, Supply Chain Management, Supply Chain Data AnalyticsSupply Chain Analytics, Supply Chain Management, Supply Chain Data Analytics
Supply Chain Analytics, Supply Chain Management, Supply Chain Data Analytics
MujtabaAliKhan12
 
Statistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptxStatistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptx
rajalakshmi5921
 
Ad

More from hktripathy (18)

Lect 3 background mathematics
Lect 3 background mathematicsLect 3 background mathematics
Lect 3 background mathematics
hktripathy
 
Lect 2 getting to know your data
Lect 2 getting to know your dataLect 2 getting to know your data
Lect 2 getting to know your data
hktripathy
 
Lect 1 introduction
Lect 1 introductionLect 1 introduction
Lect 1 introduction
hktripathy
 
Lecture7.1 data sampling
Lecture7.1 data samplingLecture7.1 data sampling
Lecture7.1 data sampling
hktripathy
 
Lecture5 virtualization
Lecture5 virtualizationLecture5 virtualization
Lecture5 virtualization
hktripathy
 
Lecture6 introduction to data streams
Lecture6 introduction to data streamsLecture6 introduction to data streams
Lecture6 introduction to data streams
hktripathy
 
Lecture4 big data technology foundations
Lecture4 big data technology foundationsLecture4 big data technology foundations
Lecture4 big data technology foundations
hktripathy
 
Lecture3 business intelligence
Lecture3 business intelligenceLecture3 business intelligence
Lecture3 business intelligence
hktripathy
 
Lecture2 big data life cycle
Lecture2 big data life cycleLecture2 big data life cycle
Lecture2 big data life cycle
hktripathy
 
Lecture1 introduction to big data
Lecture1 introduction to big dataLecture1 introduction to big data
Lecture1 introduction to big data
hktripathy
 
Lect9 Decision tree
Lect9 Decision treeLect9 Decision tree
Lect9 Decision tree
hktripathy
 
Lect7 Association analysis to correlation analysis
Lect7 Association analysis to correlation analysisLect7 Association analysis to correlation analysis
Lect7 Association analysis to correlation analysis
hktripathy
 
Lect6 Association rule & Apriori algorithm
Lect6 Association rule & Apriori algorithmLect6 Association rule & Apriori algorithm
Lect6 Association rule & Apriori algorithm
hktripathy
 
Lect5 principal component analysis
Lect5 principal component analysisLect5 principal component analysis
Lect5 principal component analysis
hktripathy
 
Lect4 principal component analysis-I
Lect4 principal component analysis-ILect4 principal component analysis-I
Lect4 principal component analysis-I
hktripathy
 
Lect 3 background mathematics for Data Mining
Lect 3 background mathematics for Data MiningLect 3 background mathematics for Data Mining
Lect 3 background mathematics for Data Mining
hktripathy
 
Lect 2 getting to know your data
Lect 2 getting to know your dataLect 2 getting to know your data
Lect 2 getting to know your data
hktripathy
 
Lect 1 introduction
Lect 1 introductionLect 1 introduction
Lect 1 introduction
hktripathy
 
Lect 3 background mathematics
Lect 3 background mathematicsLect 3 background mathematics
Lect 3 background mathematics
hktripathy
 
Lect 2 getting to know your data
Lect 2 getting to know your dataLect 2 getting to know your data
Lect 2 getting to know your data
hktripathy
 
Lect 1 introduction
Lect 1 introductionLect 1 introduction
Lect 1 introduction
hktripathy
 
Lecture7.1 data sampling
Lecture7.1 data samplingLecture7.1 data sampling
Lecture7.1 data sampling
hktripathy
 
Lecture5 virtualization
Lecture5 virtualizationLecture5 virtualization
Lecture5 virtualization
hktripathy
 
Lecture6 introduction to data streams
Lecture6 introduction to data streamsLecture6 introduction to data streams
Lecture6 introduction to data streams
hktripathy
 
Lecture4 big data technology foundations
Lecture4 big data technology foundationsLecture4 big data technology foundations
Lecture4 big data technology foundations
hktripathy
 
Lecture3 business intelligence
Lecture3 business intelligenceLecture3 business intelligence
Lecture3 business intelligence
hktripathy
 
Lecture2 big data life cycle
Lecture2 big data life cycleLecture2 big data life cycle
Lecture2 big data life cycle
hktripathy
 
Lecture1 introduction to big data
Lecture1 introduction to big dataLecture1 introduction to big data
Lecture1 introduction to big data
hktripathy
 
Lect9 Decision tree
Lect9 Decision treeLect9 Decision tree
Lect9 Decision tree
hktripathy
 
Lect7 Association analysis to correlation analysis
Lect7 Association analysis to correlation analysisLect7 Association analysis to correlation analysis
Lect7 Association analysis to correlation analysis
hktripathy
 
Lect6 Association rule & Apriori algorithm
Lect6 Association rule & Apriori algorithmLect6 Association rule & Apriori algorithm
Lect6 Association rule & Apriori algorithm
hktripathy
 
Lect5 principal component analysis
Lect5 principal component analysisLect5 principal component analysis
Lect5 principal component analysis
hktripathy
 
Lect4 principal component analysis-I
Lect4 principal component analysis-ILect4 principal component analysis-I
Lect4 principal component analysis-I
hktripathy
 
Lect 3 background mathematics for Data Mining
Lect 3 background mathematics for Data MiningLect 3 background mathematics for Data Mining
Lect 3 background mathematics for Data Mining
hktripathy
 
Lect 2 getting to know your data
Lect 2 getting to know your dataLect 2 getting to know your data
Lect 2 getting to know your data
hktripathy
 
Lect 1 introduction
Lect 1 introductionLect 1 introduction
Lect 1 introduction
hktripathy
 
Ad

Recently uploaded (20)

How to Share Accounts Between Companies in Odoo 18
How to Share Accounts Between Companies in Odoo 18How to Share Accounts Between Companies in Odoo 18
How to Share Accounts Between Companies in Odoo 18
Celine George
 
All About the 990 Unlocking Its Mysteries and Its Power.pdf
All About the 990 Unlocking Its Mysteries and Its Power.pdfAll About the 990 Unlocking Its Mysteries and Its Power.pdf
All About the 990 Unlocking Its Mysteries and Its Power.pdf
TechSoup
 
puzzle Irregular Verbs- Simple Past Tense
puzzle Irregular Verbs- Simple Past Tensepuzzle Irregular Verbs- Simple Past Tense
puzzle Irregular Verbs- Simple Past Tense
OlgaLeonorTorresSnch
 
History Of The Monastery Of Mor Gabriel Philoxenos Yuhanon Dolabani
History Of The Monastery Of Mor Gabriel Philoxenos Yuhanon DolabaniHistory Of The Monastery Of Mor Gabriel Philoxenos Yuhanon Dolabani
History Of The Monastery Of Mor Gabriel Philoxenos Yuhanon Dolabani
fruinkamel7m
 
Ajanta Paintings: Study as a Source of History
Ajanta Paintings: Study as a Source of HistoryAjanta Paintings: Study as a Source of History
Ajanta Paintings: Study as a Source of History
Virag Sontakke
 
Classification of mental disorder in 5th semester bsc. nursing and also used ...
Classification of mental disorder in 5th semester bsc. nursing and also used ...Classification of mental disorder in 5th semester bsc. nursing and also used ...
Classification of mental disorder in 5th semester bsc. nursing and also used ...
parmarjuli1412
 
How to Clean Your Contacts Using the Deduplication Menu in Odoo 18
How to Clean Your Contacts Using the Deduplication Menu in Odoo 18How to Clean Your Contacts Using the Deduplication Menu in Odoo 18
How to Clean Your Contacts Using the Deduplication Menu in Odoo 18
Celine George
 
How to Manage Amounts in Local Currency in Odoo 18 Purchase
How to Manage Amounts in Local Currency in Odoo 18 PurchaseHow to Manage Amounts in Local Currency in Odoo 18 Purchase
How to Manage Amounts in Local Currency in Odoo 18 Purchase
Celine George
 
How to Configure Public Holidays & Mandatory Days in Odoo 18
How to Configure Public Holidays & Mandatory Days in Odoo 18How to Configure Public Holidays & Mandatory Days in Odoo 18
How to Configure Public Holidays & Mandatory Days in Odoo 18
Celine George
 
U3 ANTITUBERCULAR DRUGS Pharmacology 3.pptx
U3 ANTITUBERCULAR DRUGS Pharmacology 3.pptxU3 ANTITUBERCULAR DRUGS Pharmacology 3.pptx
U3 ANTITUBERCULAR DRUGS Pharmacology 3.pptx
Mayuri Chavan
 
Search Matching Applicants in Odoo 18 - Odoo Slides
Search Matching Applicants in Odoo 18 - Odoo SlidesSearch Matching Applicants in Odoo 18 - Odoo Slides
Search Matching Applicants in Odoo 18 - Odoo Slides
Celine George
 
Pope Leo XIV, the first Pope from North America.pptx
Pope Leo XIV, the first Pope from North America.pptxPope Leo XIV, the first Pope from North America.pptx
Pope Leo XIV, the first Pope from North America.pptx
Martin M Flynn
 
The role of wall art in interior designing
The role of wall art in interior designingThe role of wall art in interior designing
The role of wall art in interior designing
meghaark2110
 
Transform tomorrow: Master benefits analysis with Gen AI today webinar, 30 A...
Transform tomorrow: Master benefits analysis with Gen AI today webinar,  30 A...Transform tomorrow: Master benefits analysis with Gen AI today webinar,  30 A...
Transform tomorrow: Master benefits analysis with Gen AI today webinar, 30 A...
Association for Project Management
 
UPMVLE migration to ARAL. A step- by- step guide
UPMVLE migration to ARAL. A step- by- step guideUPMVLE migration to ARAL. A step- by- step guide
UPMVLE migration to ARAL. A step- by- step guide
abmerca
 
MCQ PHYSIOLOGY II (DR. NASIR MUSTAFA) MCQS)
MCQ PHYSIOLOGY II (DR. NASIR MUSTAFA) MCQS)MCQ PHYSIOLOGY II (DR. NASIR MUSTAFA) MCQS)
MCQ PHYSIOLOGY II (DR. NASIR MUSTAFA) MCQS)
Dr. Nasir Mustafa
 
*"Sensing the World: Insect Sensory Systems"*
*"Sensing the World: Insect Sensory Systems"**"Sensing the World: Insect Sensory Systems"*
*"Sensing the World: Insect Sensory Systems"*
Arshad Shaikh
 
antiquity of writing in ancient India- literary & archaeological evidence
antiquity of writing in ancient India- literary & archaeological evidenceantiquity of writing in ancient India- literary & archaeological evidence
antiquity of writing in ancient India- literary & archaeological evidence
PrachiSontakke5
 
Cultivation Practice of Garlic in Nepal.pptx
Cultivation Practice of Garlic in Nepal.pptxCultivation Practice of Garlic in Nepal.pptx
Cultivation Practice of Garlic in Nepal.pptx
UmeshTimilsina1
 
ANTI-VIRAL DRUGS unit 3 Pharmacology 3.pptx
ANTI-VIRAL DRUGS unit 3 Pharmacology 3.pptxANTI-VIRAL DRUGS unit 3 Pharmacology 3.pptx
ANTI-VIRAL DRUGS unit 3 Pharmacology 3.pptx
Mayuri Chavan
 
How to Share Accounts Between Companies in Odoo 18
How to Share Accounts Between Companies in Odoo 18How to Share Accounts Between Companies in Odoo 18
How to Share Accounts Between Companies in Odoo 18
Celine George
 
All About the 990 Unlocking Its Mysteries and Its Power.pdf
All About the 990 Unlocking Its Mysteries and Its Power.pdfAll About the 990 Unlocking Its Mysteries and Its Power.pdf
All About the 990 Unlocking Its Mysteries and Its Power.pdf
TechSoup
 
puzzle Irregular Verbs- Simple Past Tense
puzzle Irregular Verbs- Simple Past Tensepuzzle Irregular Verbs- Simple Past Tense
puzzle Irregular Verbs- Simple Past Tense
OlgaLeonorTorresSnch
 
History Of The Monastery Of Mor Gabriel Philoxenos Yuhanon Dolabani
History Of The Monastery Of Mor Gabriel Philoxenos Yuhanon DolabaniHistory Of The Monastery Of Mor Gabriel Philoxenos Yuhanon Dolabani
History Of The Monastery Of Mor Gabriel Philoxenos Yuhanon Dolabani
fruinkamel7m
 
Ajanta Paintings: Study as a Source of History
Ajanta Paintings: Study as a Source of HistoryAjanta Paintings: Study as a Source of History
Ajanta Paintings: Study as a Source of History
Virag Sontakke
 
Classification of mental disorder in 5th semester bsc. nursing and also used ...
Classification of mental disorder in 5th semester bsc. nursing and also used ...Classification of mental disorder in 5th semester bsc. nursing and also used ...
Classification of mental disorder in 5th semester bsc. nursing and also used ...
parmarjuli1412
 
How to Clean Your Contacts Using the Deduplication Menu in Odoo 18
How to Clean Your Contacts Using the Deduplication Menu in Odoo 18How to Clean Your Contacts Using the Deduplication Menu in Odoo 18
How to Clean Your Contacts Using the Deduplication Menu in Odoo 18
Celine George
 
How to Manage Amounts in Local Currency in Odoo 18 Purchase
How to Manage Amounts in Local Currency in Odoo 18 PurchaseHow to Manage Amounts in Local Currency in Odoo 18 Purchase
How to Manage Amounts in Local Currency in Odoo 18 Purchase
Celine George
 
How to Configure Public Holidays & Mandatory Days in Odoo 18
How to Configure Public Holidays & Mandatory Days in Odoo 18How to Configure Public Holidays & Mandatory Days in Odoo 18
How to Configure Public Holidays & Mandatory Days in Odoo 18
Celine George
 
U3 ANTITUBERCULAR DRUGS Pharmacology 3.pptx
U3 ANTITUBERCULAR DRUGS Pharmacology 3.pptxU3 ANTITUBERCULAR DRUGS Pharmacology 3.pptx
U3 ANTITUBERCULAR DRUGS Pharmacology 3.pptx
Mayuri Chavan
 
Search Matching Applicants in Odoo 18 - Odoo Slides
Search Matching Applicants in Odoo 18 - Odoo SlidesSearch Matching Applicants in Odoo 18 - Odoo Slides
Search Matching Applicants in Odoo 18 - Odoo Slides
Celine George
 
Pope Leo XIV, the first Pope from North America.pptx
Pope Leo XIV, the first Pope from North America.pptxPope Leo XIV, the first Pope from North America.pptx
Pope Leo XIV, the first Pope from North America.pptx
Martin M Flynn
 
The role of wall art in interior designing
The role of wall art in interior designingThe role of wall art in interior designing
The role of wall art in interior designing
meghaark2110
 
Transform tomorrow: Master benefits analysis with Gen AI today webinar, 30 A...
Transform tomorrow: Master benefits analysis with Gen AI today webinar,  30 A...Transform tomorrow: Master benefits analysis with Gen AI today webinar,  30 A...
Transform tomorrow: Master benefits analysis with Gen AI today webinar, 30 A...
Association for Project Management
 
UPMVLE migration to ARAL. A step- by- step guide
UPMVLE migration to ARAL. A step- by- step guideUPMVLE migration to ARAL. A step- by- step guide
UPMVLE migration to ARAL. A step- by- step guide
abmerca
 
MCQ PHYSIOLOGY II (DR. NASIR MUSTAFA) MCQS)
MCQ PHYSIOLOGY II (DR. NASIR MUSTAFA) MCQS)MCQ PHYSIOLOGY II (DR. NASIR MUSTAFA) MCQS)
MCQ PHYSIOLOGY II (DR. NASIR MUSTAFA) MCQS)
Dr. Nasir Mustafa
 
*"Sensing the World: Insect Sensory Systems"*
*"Sensing the World: Insect Sensory Systems"**"Sensing the World: Insect Sensory Systems"*
*"Sensing the World: Insect Sensory Systems"*
Arshad Shaikh
 
antiquity of writing in ancient India- literary & archaeological evidence
antiquity of writing in ancient India- literary & archaeological evidenceantiquity of writing in ancient India- literary & archaeological evidence
antiquity of writing in ancient India- literary & archaeological evidence
PrachiSontakke5
 
Cultivation Practice of Garlic in Nepal.pptx
Cultivation Practice of Garlic in Nepal.pptxCultivation Practice of Garlic in Nepal.pptx
Cultivation Practice of Garlic in Nepal.pptx
UmeshTimilsina1
 
ANTI-VIRAL DRUGS unit 3 Pharmacology 3.pptx
ANTI-VIRAL DRUGS unit 3 Pharmacology 3.pptxANTI-VIRAL DRUGS unit 3 Pharmacology 3.pptx
ANTI-VIRAL DRUGS unit 3 Pharmacology 3.pptx
Mayuri Chavan
 

Lect8 Classification & prediction

  • 2. Classification and Prediction • What is classification? What is prediction? • Issues regarding classification and prediction • Classification by decision tree induction
  • 3. • Classification: – predicts categorical class labels – classifies data (constructs a model) based on the training set and the values (class labels) in a classifying attribute and uses it in classifying new data • Prediction: – models continuous-valued functions, i.e., predicts unknown or missing values • Typical Applications – credit approval – target marketing – medical diagnosis – treatment effectiveness analysis Classification vs. Prediction
  • 4. Classification Learning: Definition l Given a collection of records (training set) – Each record contains a set of attributes, one of the attributes is the class l Find a model for the class attribute as a function of the values of the other attributes l Goal: previously unseen records should be assigned a class as accurately as possible – Use test set to estimate the accuracy of the model – Often, the given data set is divided into training and test sets, with training set used to build the model and test set used to validate it.
  • 5. Illustrating Classification Learning Apply Model Induction Deduction Learn Model Model Tid Attrib1 Attrib2 Attrib3 Class 1 Yes Large 125K No 2 No Medium 100K No 3 No Small 70K No 4 Yes Medium 120K No 5 No Large 95K Yes 6 No Medium 60K No 7 Yes Large 220K No 8 No Small 85K Yes 9 No Medium 75K No 10 No Small 90K Yes 10 Tid Attrib1 Attrib2 Attrib3 Class 11 No Small 55K ? 12 Yes Medium 80K ? 13 Yes Large 110K ? 14 No Small 95K ? 15 No Large 67K ? 10 Test Set Learning algorithm Training Set
  • 6. Examples of Classification Task l Predicting tumor cells as benign or malignant l Classifying credit card transactions as legitimate or fraudulent l Classifying secondary structures of protein as alpha-helix, beta-sheet, or random coil l Categorizing news stories as finance, weather, entertainment, sports, etc.
  • 7. Classification - A Two-Step Process • Model construction: describing a set of predetermined classes – Building the Classifier or Model – Each tuple/sample is assumed to belong to a predefined class, as determined by the class label attribute – The set of tuples used for model construction: training set – The model is represented as classification rules, decision trees, or mathematical formulae • Model usage: for classifying future or unknown objects – Using Classifier for Classification – Estimate accuracy of the model • The known label of test sample is compared with the classified result from the model • Accuracy rate is the percentage of test set samples that are correctly classified by the model • Test set is independent of training set, otherwise over-fitting will occur
  • 8. Classification Process (1): Model Construction Example: Loan application The data classification process: Learning: Training data are analyzed by a classification algorithm. Here, the class label attribute is loan decision, and the learned model or classifier is represented in the form of classification rules.
  • 9. Classification Process (2): Use the Model in Prediction Classification: Test data are used to estimate the accuracy of the classification rules. If the accuracy is considered acceptable, the rules can be applied to the classification of new data tuples.
  • 10. Supervised vs. Unsupervised Learning • Supervised learning (classification) – Supervision: The training data (observations, measurements, etc.) are accompanied by labels indicating the class of the observations – New data is classified based on the training set • Unsupervised learning (clustering) – The class labels of training data is unknown – Given a set of measurements, observations, etc. with the aim of establishing the existence of classes or clusters in the data
  • 11. Issues (1): Data Preparation • Data cleaning – Preprocess data in order to reduce noise and handle missing values • Relevance analysis (feature selection) – Remove the irrelevant or redundant attributes • Data transformation – Generalize and/or normalize data
  • 12. Issues (2): Evaluating Classification Methods • Predictive accuracy • Speed and scalability – time to construct the model – time to use the model • Robustness – handling noise and missing values • Scalability – efficiency in disk-resident databases • Interpretability: – understanding and insight provded by the model • Goodness of rules – decision tree size – compactness of classification rules
  • 13. The problem • Given a set of training cases/objects and their attribute values, try to determine the target attribute value of new examples. – Classification – Prediction • Use a decision tree to predict categories for new events. • Use training data to build the decision tree. New Events Decision Tree Category Training Events and Categories
  翻译: