SlideShare a Scribd company logo
Machine Learning Methods for
Data Mining
Based on-
Data Mining: Concepts and Techniques
Han, Kamber & Pei
A.B.M. Ashikur Rahman
Asst. Professor,
Dept. of CSE, IUT
Data Mining
Knowledge Discovery from Data (KDD) process steps-
• Data Cleaning
• Data Integration
• Data Selection
• Data Transformation
• Pattern Mining
• Pattern Evaluation
• Knowledge Representation
e.g.-
Frequent itemsets,
Association rule (Strong/week)
3
Supervised vs. Unsupervised Learning
• Supervised learning (classification)
• Supervision: The training data (observations, measurements, etc.) are
accompanied by labels indicating the class of the observations
• New data is classified based on the training set
• Unsupervised learning (clustering)
• The class labels of training data is unknown
• Given a set of measurements, observations, etc. with the aim of establishing the
existence of classes or clusters in the data
4
Classification vs. Numeric Prediction
• Classification
• predicts categorical class labels (discrete or nominal)
• classifies data (constructs a model) based on the training set and the values (class
labels) in a classifying attribute and uses it in classifying new data
• Numeric Prediction
• models continuous-valued functions, i.e., predicts unknown or missing values
• Typical applications
• Credit/loan approval:
• Medical diagnosis: if a tumor is cancerous or benign
• Fraud detection: if a transaction is fraudulent
• Web page categorization: which category it is
Prediction Problems:
5
Classification—A Two-Step Process
• Model construction: describing a set of predetermined classes
• Each tuple/sample is assumed to belong to a predefined class, as determined by the class label
attribute
• The set of tuples used for model construction is training set
• The model is represented as classification rules, decision trees, or mathematical formulae
• Model usage: for classifying future or unknown objects
• Estimate accuracy of the model
• The known label of test sample is compared with the classified result from the model
• Accuracy rate is the percentage of test set samples that are correctly classified by the model
• Test set is independent of training set (otherwise overfitting)
• If the accuracy is acceptable, use the model to classify new data
• Note: If the test set is used to select models, it is called validation (test) set
6
Process (1): Model Construction
Training
Data
NAME RANK YEARS TENURED
Mike Assistant Prof 3 no
Mary Assistant Prof 7 yes
Bill Professor 2 yes
Jim Associate Prof 7 yes
Dave Assistant Prof 6 no
Anne Associate Prof 3 no
Classification
Algorithms
IF rank = ‘professor’
OR years > 6
THEN tenured = ‘yes’
Classifier
(Model)
7
Process (2): Using the Model in Prediction
Classifier
Testing
Data
NAME RANK YEARS TENURED
Tom Assistant Prof 2 no
Merlisa Associate Prof 7 no
George Professor 5 yes
Joseph Assistant Prof 7 yes
Unseen Data
(Jeff, Professor, 4)
Tenured?
Classification Methods
• Decision Tree Induction
• Naïve Bayesian Classification
• Rule based Classification
• Bayesian Belief Network
• Support Vector Machine (SVM) etc.
9
What is Cluster Analysis?
• Cluster: A collection of data objects
• similar (or related) to one another within the same group
• dissimilar (or unrelated) to the objects in other groups
• Cluster analysis (or clustering, data segmentation, …)
• Finding similarities between data according to the characteristics found in the data
and grouping similar data objects into clusters
• Unsupervised learning: no predefined classes (i.e., learning by observations vs.
learning by examples: supervised)
• Typical applications
• As a stand-alone tool to get insight into data distribution
• As a preprocessing step for other algorithms
10
Clustering for Data Understanding and Applications
• Biology: taxonomy of living things: kingdom, phylum, class, order, family, genus and species
• Information retrieval: document clustering
• Land use: Identification of areas of similar land use in an earth observation database
• Marketing: Help marketers discover distinct groups in their customer bases, and then use this
knowledge to develop targeted marketing programs
• City-planning: Identifying groups of houses according to their house type, value, and geographical
location
• Earth-quake studies: Observed earth quake epicenters should be clustered along continent faults
• Climate: understanding earth climate, find patterns of atmospheric and ocean
• Economic Science: market resarch
11
Clustering as a Preprocessing Tool (Utility)
• Summarization:
• Preprocessing for regression, PCA, classification, and association analysis
• Compression:
• Image processing: vector quantization
• Finding K-nearest Neighbors
• Localizing search to one or a small number of clusters
• Outlier detection
• Outliers are often viewed as those “far away” from any cluster
Quality: What Is Good Clustering?
• A good clustering method will produce high quality clusters
• high intra-class similarity: cohesive within clusters
• low inter-class similarity: distinctive between clusters
• The quality of a clustering method depends on
• the similarity measure used by the method
• its implementation, and
• Its ability to discover some or all of the hidden patterns
12
Measure the Quality of Clustering
• Dissimilarity/Similarity metric
• Similarity is expressed in terms of a distance function, typically metric: d(i, j)
• The definitions of distance functions are usually rather different for interval-
scaled, boolean, categorical, ordinal ratio, and vector variables
• Weights should be associated with different variables based on applications and
data semantics
• Quality of clustering:
• There is usually a separate “quality” function that measures the “goodness” of a
cluster.
• It is hard to define “similar enough” or “good enough”
• The answer is typically highly subjective
13
Major Clustering Approaches (I)
• Partitioning approach:
• Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum
of square errors
• Typical methods: k-means, k-medoids, CLARANS
• Hierarchical approach:
• Create a hierarchical decomposition of the set of data (or objects) using some criterion
• Typical methods: Diana, Agnes, BIRCH, CAMELEON
• Density-based approach:
• Based on connectivity and density functions
• Typical methods: DBSACN, OPTICS, DenClue
• Grid-based approach:
• based on a multiple-level granularity structure
• Typical methods: STING, WaveCluster, CLIQUE
14
Ad

More Related Content

What's hot (18)

Anomaly Detection Technique
Anomaly Detection TechniqueAnomaly Detection Technique
Anomaly Detection Technique
Chakrit Phain
 
12 outlier
12 outlier12 outlier
12 outlier
JoonyoungJayGwak
 
Anomaly Detection
Anomaly DetectionAnomaly Detection
Anomaly Detection
guest0edcaf
 
2. sampling techniques
2. sampling techniques2. sampling techniques
2. sampling techniques
Debasish Padhy
 
Qualitative data analysis
Qualitative data analysisQualitative data analysis
Qualitative data analysis
Shankar Talwar
 
Qualitative Data Analysis (Steps)
Qualitative Data Analysis (Steps)Qualitative Data Analysis (Steps)
Qualitative Data Analysis (Steps)
guest7f1ad678
 
Research Method EMBA chapter 10
Research Method EMBA chapter 10Research Method EMBA chapter 10
Research Method EMBA chapter 10
Mazhar Poohlah
 
Survey on Various Classification Techniques in Data Mining
Survey on Various Classification Techniques in Data MiningSurvey on Various Classification Techniques in Data Mining
Survey on Various Classification Techniques in Data Mining
ijsrd.com
 
Using Qualitative Data Analysis Software By Michelle C. Bligh, Ph.D., Claremo...
Using Qualitative Data Analysis Software By Michelle C. Bligh, Ph.D., Claremo...Using Qualitative Data Analysis Software By Michelle C. Bligh, Ph.D., Claremo...
Using Qualitative Data Analysis Software By Michelle C. Bligh, Ph.D., Claremo...
James Mullooly PhD
 
Statistical sampling
Statistical samplingStatistical sampling
Statistical sampling
Dr. S. Bulomine Regi
 
导论1
导论1导论1
导论1
dj870127
 
Chap10 Anomaly Detection
Chap10 Anomaly DetectionChap10 Anomaly Detection
Chap10 Anomaly Detection
guest76d673
 
Knowledge Discovery
Knowledge DiscoveryKnowledge Discovery
Knowledge Discovery
DataminingTools Inc
 
Research Method for Business chapter 10
Research Method for Business chapter  10Research Method for Business chapter  10
Research Method for Business chapter 10
Mazhar Poohlah
 
Classification
ClassificationClassification
Classification
Dr. C.V. Suresh Babu
 
Data Visualization in Exploratory Data Analysis
Data Visualization in Exploratory Data AnalysisData Visualization in Exploratory Data Analysis
Data Visualization in Exploratory Data Analysis
Eva Durall
 
Sampling Design
Sampling DesignSampling Design
Sampling Design
Jale Nonan
 
615900072
615900072615900072
615900072
picktru
 
Anomaly Detection Technique
Anomaly Detection TechniqueAnomaly Detection Technique
Anomaly Detection Technique
Chakrit Phain
 
Anomaly Detection
Anomaly DetectionAnomaly Detection
Anomaly Detection
guest0edcaf
 
2. sampling techniques
2. sampling techniques2. sampling techniques
2. sampling techniques
Debasish Padhy
 
Qualitative data analysis
Qualitative data analysisQualitative data analysis
Qualitative data analysis
Shankar Talwar
 
Qualitative Data Analysis (Steps)
Qualitative Data Analysis (Steps)Qualitative Data Analysis (Steps)
Qualitative Data Analysis (Steps)
guest7f1ad678
 
Research Method EMBA chapter 10
Research Method EMBA chapter 10Research Method EMBA chapter 10
Research Method EMBA chapter 10
Mazhar Poohlah
 
Survey on Various Classification Techniques in Data Mining
Survey on Various Classification Techniques in Data MiningSurvey on Various Classification Techniques in Data Mining
Survey on Various Classification Techniques in Data Mining
ijsrd.com
 
Using Qualitative Data Analysis Software By Michelle C. Bligh, Ph.D., Claremo...
Using Qualitative Data Analysis Software By Michelle C. Bligh, Ph.D., Claremo...Using Qualitative Data Analysis Software By Michelle C. Bligh, Ph.D., Claremo...
Using Qualitative Data Analysis Software By Michelle C. Bligh, Ph.D., Claremo...
James Mullooly PhD
 
Chap10 Anomaly Detection
Chap10 Anomaly DetectionChap10 Anomaly Detection
Chap10 Anomaly Detection
guest76d673
 
Research Method for Business chapter 10
Research Method for Business chapter  10Research Method for Business chapter  10
Research Method for Business chapter 10
Mazhar Poohlah
 
Data Visualization in Exploratory Data Analysis
Data Visualization in Exploratory Data AnalysisData Visualization in Exploratory Data Analysis
Data Visualization in Exploratory Data Analysis
Eva Durall
 
Sampling Design
Sampling DesignSampling Design
Sampling Design
Jale Nonan
 
615900072
615900072615900072
615900072
picktru
 

Similar to Machine learning algorithms for data mining (20)

Lect8 Classification & prediction
Lect8 Classification & predictionLect8 Classification & prediction
Lect8 Classification & prediction
hktripathy
 
Data mining chapter04and5-best
Data mining chapter04and5-bestData mining chapter04and5-best
Data mining chapter04and5-best
ABDUmomo
 
Data mining techniques unit iv
Data mining techniques unit ivData mining techniques unit iv
Data mining techniques unit iv
malathieswaran29
 
BTech Pattern Recognition Notes
BTech Pattern Recognition NotesBTech Pattern Recognition Notes
BTech Pattern Recognition Notes
Ashutosh Agrahari
 
Lecturer3 by RamaKrishna SRU waranagal telanga
Lecturer3 by RamaKrishna SRU waranagal telangaLecturer3 by RamaKrishna SRU waranagal telanga
Lecturer3 by RamaKrishna SRU waranagal telanga
coolscools1231
 
Data mining techniques unit v
Data mining techniques unit vData mining techniques unit v
Data mining techniques unit v
malathieswaran29
 
Chapter 4 Classification in data sience .pdf
Chapter 4 Classification in data sience .pdfChapter 4 Classification in data sience .pdf
Chapter 4 Classification in data sience .pdf
AschalewAyele2
 
ML SFCSE.pptx
ML SFCSE.pptxML SFCSE.pptx
ML SFCSE.pptx
NIKHILGR3
 
Unit 4 Classification of data and more info on it
Unit 4 Classification of data and more info on itUnit 4 Classification of data and more info on it
Unit 4 Classification of data and more info on it
randomguy1722
 
Weka bike rental
Weka bike rentalWeka bike rental
Weka bike rental
Pratik Doshi
 
01 Introduction to Data Mining
01 Introduction to Data Mining01 Introduction to Data Mining
01 Introduction to Data Mining
Valerii Klymchuk
 
THEORITICAL FRAMEWORK FOR THE DATA MINING PROCESS
THEORITICAL FRAMEWORK FOR THE DATA MINING PROCESSTHEORITICAL FRAMEWORK FOR THE DATA MINING PROCESS
THEORITICAL FRAMEWORK FOR THE DATA MINING PROCESS
mebite666
 
DM_clustering.ppt
DM_clustering.pptDM_clustering.ppt
DM_clustering.ppt
nandhini manoharan
 
Cluster
ClusterCluster
Cluster
tigerjayadev
 
Classification and Prediction.pptx
Classification and Prediction.pptxClassification and Prediction.pptx
Classification and Prediction.pptx
SandeepAgrawal84
 
Ml leaning this ppt display number of mltypes.pptx
Ml leaning this ppt display number of mltypes.pptxMl leaning this ppt display number of mltypes.pptx
Ml leaning this ppt display number of mltypes.pptx
HardikJakhmola1
 
introducatio to ml introducatio to ml introducatio to ml
introducatio to ml introducatio to ml introducatio to mlintroducatio to ml introducatio to ml introducatio to ml
introducatio to ml introducatio to ml introducatio to ml
DecentMusicians
 
Lecture 5 machine learning updated
Lecture 5   machine learning updatedLecture 5   machine learning updated
Lecture 5 machine learning updated
Vajira Thambawita
 
Classification and Cluster 2BCasic Concepts
Classification and  Cluster 2BCasic ConceptsClassification and  Cluster 2BCasic Concepts
Classification and Cluster 2BCasic Concepts
MSridhar18
 
Advanced Working Principles on Supervised and Unsupervised Learning
Advanced Working Principles on Supervised and Unsupervised LearningAdvanced Working Principles on Supervised and Unsupervised Learning
Advanced Working Principles on Supervised and Unsupervised Learning
Nahin Kumar Dey
 
Lect8 Classification & prediction
Lect8 Classification & predictionLect8 Classification & prediction
Lect8 Classification & prediction
hktripathy
 
Data mining chapter04and5-best
Data mining chapter04and5-bestData mining chapter04and5-best
Data mining chapter04and5-best
ABDUmomo
 
Data mining techniques unit iv
Data mining techniques unit ivData mining techniques unit iv
Data mining techniques unit iv
malathieswaran29
 
BTech Pattern Recognition Notes
BTech Pattern Recognition NotesBTech Pattern Recognition Notes
BTech Pattern Recognition Notes
Ashutosh Agrahari
 
Lecturer3 by RamaKrishna SRU waranagal telanga
Lecturer3 by RamaKrishna SRU waranagal telangaLecturer3 by RamaKrishna SRU waranagal telanga
Lecturer3 by RamaKrishna SRU waranagal telanga
coolscools1231
 
Data mining techniques unit v
Data mining techniques unit vData mining techniques unit v
Data mining techniques unit v
malathieswaran29
 
Chapter 4 Classification in data sience .pdf
Chapter 4 Classification in data sience .pdfChapter 4 Classification in data sience .pdf
Chapter 4 Classification in data sience .pdf
AschalewAyele2
 
ML SFCSE.pptx
ML SFCSE.pptxML SFCSE.pptx
ML SFCSE.pptx
NIKHILGR3
 
Unit 4 Classification of data and more info on it
Unit 4 Classification of data and more info on itUnit 4 Classification of data and more info on it
Unit 4 Classification of data and more info on it
randomguy1722
 
01 Introduction to Data Mining
01 Introduction to Data Mining01 Introduction to Data Mining
01 Introduction to Data Mining
Valerii Klymchuk
 
THEORITICAL FRAMEWORK FOR THE DATA MINING PROCESS
THEORITICAL FRAMEWORK FOR THE DATA MINING PROCESSTHEORITICAL FRAMEWORK FOR THE DATA MINING PROCESS
THEORITICAL FRAMEWORK FOR THE DATA MINING PROCESS
mebite666
 
Classification and Prediction.pptx
Classification and Prediction.pptxClassification and Prediction.pptx
Classification and Prediction.pptx
SandeepAgrawal84
 
Ml leaning this ppt display number of mltypes.pptx
Ml leaning this ppt display number of mltypes.pptxMl leaning this ppt display number of mltypes.pptx
Ml leaning this ppt display number of mltypes.pptx
HardikJakhmola1
 
introducatio to ml introducatio to ml introducatio to ml
introducatio to ml introducatio to ml introducatio to mlintroducatio to ml introducatio to ml introducatio to ml
introducatio to ml introducatio to ml introducatio to ml
DecentMusicians
 
Lecture 5 machine learning updated
Lecture 5   machine learning updatedLecture 5   machine learning updated
Lecture 5 machine learning updated
Vajira Thambawita
 
Classification and Cluster 2BCasic Concepts
Classification and  Cluster 2BCasic ConceptsClassification and  Cluster 2BCasic Concepts
Classification and Cluster 2BCasic Concepts
MSridhar18
 
Advanced Working Principles on Supervised and Unsupervised Learning
Advanced Working Principles on Supervised and Unsupervised LearningAdvanced Working Principles on Supervised and Unsupervised Learning
Advanced Working Principles on Supervised and Unsupervised Learning
Nahin Kumar Dey
 
Ad

More from Ashikur Rahman (10)

Graph Theory: Matrix representation of graphs
Graph Theory: Matrix representation of graphsGraph Theory: Matrix representation of graphs
Graph Theory: Matrix representation of graphs
Ashikur Rahman
 
SOP writing: What, Why & How
SOP writing: What, Why & HowSOP writing: What, Why & How
SOP writing: What, Why & How
Ashikur Rahman
 
Graph Theory: Planarity & Dual Graph
Graph Theory: Planarity & Dual GraphGraph Theory: Planarity & Dual Graph
Graph Theory: Planarity & Dual Graph
Ashikur Rahman
 
Graph Theory: Connectivity & Isomorphism
Graph Theory: Connectivity & Isomorphism Graph Theory: Connectivity & Isomorphism
Graph Theory: Connectivity & Isomorphism
Ashikur Rahman
 
Graph Theory: Cut-Set and Cut-Vertices
Graph Theory: Cut-Set and Cut-VerticesGraph Theory: Cut-Set and Cut-Vertices
Graph Theory: Cut-Set and Cut-Vertices
Ashikur Rahman
 
Graph Theory: Trees
Graph Theory: TreesGraph Theory: Trees
Graph Theory: Trees
Ashikur Rahman
 
Graph Theory: Paths & Cycles
Graph Theory: Paths & CyclesGraph Theory: Paths & Cycles
Graph Theory: Paths & Cycles
Ashikur Rahman
 
Cybercrimes and Cybercriminals
Cybercrimes and CybercriminalsCybercrimes and Cybercriminals
Cybercrimes and Cybercriminals
Ashikur Rahman
 
E-Marketing and Advertising Concepts
E-Marketing and Advertising ConceptsE-Marketing and Advertising Concepts
E-Marketing and Advertising Concepts
Ashikur Rahman
 
Signature verification Using SIFT Features
Signature verification Using SIFT FeaturesSignature verification Using SIFT Features
Signature verification Using SIFT Features
Ashikur Rahman
 
Graph Theory: Matrix representation of graphs
Graph Theory: Matrix representation of graphsGraph Theory: Matrix representation of graphs
Graph Theory: Matrix representation of graphs
Ashikur Rahman
 
SOP writing: What, Why & How
SOP writing: What, Why & HowSOP writing: What, Why & How
SOP writing: What, Why & How
Ashikur Rahman
 
Graph Theory: Planarity & Dual Graph
Graph Theory: Planarity & Dual GraphGraph Theory: Planarity & Dual Graph
Graph Theory: Planarity & Dual Graph
Ashikur Rahman
 
Graph Theory: Connectivity & Isomorphism
Graph Theory: Connectivity & Isomorphism Graph Theory: Connectivity & Isomorphism
Graph Theory: Connectivity & Isomorphism
Ashikur Rahman
 
Graph Theory: Cut-Set and Cut-Vertices
Graph Theory: Cut-Set and Cut-VerticesGraph Theory: Cut-Set and Cut-Vertices
Graph Theory: Cut-Set and Cut-Vertices
Ashikur Rahman
 
Graph Theory: Paths & Cycles
Graph Theory: Paths & CyclesGraph Theory: Paths & Cycles
Graph Theory: Paths & Cycles
Ashikur Rahman
 
Cybercrimes and Cybercriminals
Cybercrimes and CybercriminalsCybercrimes and Cybercriminals
Cybercrimes and Cybercriminals
Ashikur Rahman
 
E-Marketing and Advertising Concepts
E-Marketing and Advertising ConceptsE-Marketing and Advertising Concepts
E-Marketing and Advertising Concepts
Ashikur Rahman
 
Signature verification Using SIFT Features
Signature verification Using SIFT FeaturesSignature verification Using SIFT Features
Signature verification Using SIFT Features
Ashikur Rahman
 
Ad

Recently uploaded (20)

MCQ PHYSIOLOGY II (DR. NASIR MUSTAFA) MCQS)
MCQ PHYSIOLOGY II (DR. NASIR MUSTAFA) MCQS)MCQ PHYSIOLOGY II (DR. NASIR MUSTAFA) MCQS)
MCQ PHYSIOLOGY II (DR. NASIR MUSTAFA) MCQS)
Dr. Nasir Mustafa
 
Cyber security COPA ITI MCQ Top Questions
Cyber security COPA ITI MCQ Top QuestionsCyber security COPA ITI MCQ Top Questions
Cyber security COPA ITI MCQ Top Questions
SONU HEETSON
 
ANTI-VIRAL DRUGS unit 3 Pharmacology 3.pptx
ANTI-VIRAL DRUGS unit 3 Pharmacology 3.pptxANTI-VIRAL DRUGS unit 3 Pharmacology 3.pptx
ANTI-VIRAL DRUGS unit 3 Pharmacology 3.pptx
Mayuri Chavan
 
Unit 5 ACUTE, SUBACUTE,CHRONIC TOXICITY.pptx
Unit 5 ACUTE, SUBACUTE,CHRONIC TOXICITY.pptxUnit 5 ACUTE, SUBACUTE,CHRONIC TOXICITY.pptx
Unit 5 ACUTE, SUBACUTE,CHRONIC TOXICITY.pptx
Mayuri Chavan
 
How to Share Accounts Between Companies in Odoo 18
How to Share Accounts Between Companies in Odoo 18How to Share Accounts Between Companies in Odoo 18
How to Share Accounts Between Companies in Odoo 18
Celine George
 
The History of Kashmir Karkota Dynasty NEP.pptx
The History of Kashmir Karkota Dynasty NEP.pptxThe History of Kashmir Karkota Dynasty NEP.pptx
The History of Kashmir Karkota Dynasty NEP.pptx
Arya Mahila P. G. College, Banaras Hindu University, Varanasi, India.
 
YSPH VMOC Special Report - Measles Outbreak Southwest US 5-14-2025 .pptx
YSPH VMOC Special Report - Measles Outbreak  Southwest US 5-14-2025  .pptxYSPH VMOC Special Report - Measles Outbreak  Southwest US 5-14-2025  .pptx
YSPH VMOC Special Report - Measles Outbreak Southwest US 5-14-2025 .pptx
Yale School of Public Health - The Virtual Medical Operations Center (VMOC)
 
The role of wall art in interior designing
The role of wall art in interior designingThe role of wall art in interior designing
The role of wall art in interior designing
meghaark2110
 
Myopathies (muscle disorders) for undergraduate
Myopathies (muscle disorders) for undergraduateMyopathies (muscle disorders) for undergraduate
Myopathies (muscle disorders) for undergraduate
Mohamed Rizk Khodair
 
Module 1: Foundations of Research
Module 1: Foundations of ResearchModule 1: Foundations of Research
Module 1: Foundations of Research
drroxannekemp
 
"Heraldry Detective Project"- Coats of Arms and Mottos of "Ivanhoe" in Ivanho...
"Heraldry Detective Project"- Coats of Arms and Mottos of "Ivanhoe" in Ivanho..."Heraldry Detective Project"- Coats of Arms and Mottos of "Ivanhoe" in Ivanho...
"Heraldry Detective Project"- Coats of Arms and Mottos of "Ivanhoe" in Ivanho...
ruslana1975
 
Chemotherapy of Malignancy -Anticancer.pptx
Chemotherapy of Malignancy -Anticancer.pptxChemotherapy of Malignancy -Anticancer.pptx
Chemotherapy of Malignancy -Anticancer.pptx
Mayuri Chavan
 
CNS infections (encephalitis, meningitis & Brain abscess
CNS infections (encephalitis, meningitis & Brain abscessCNS infections (encephalitis, meningitis & Brain abscess
CNS infections (encephalitis, meningitis & Brain abscess
Mohamed Rizk Khodair
 
E-Filing_of_Income_Tax.pptx and concept of form 26AS
E-Filing_of_Income_Tax.pptx and concept of form 26ASE-Filing_of_Income_Tax.pptx and concept of form 26AS
E-Filing_of_Income_Tax.pptx and concept of form 26AS
Abinash Palangdar
 
libbys peer assesment.docx..............
libbys peer assesment.docx..............libbys peer assesment.docx..............
libbys peer assesment.docx..............
19lburrell
 
puzzle Irregular Verbs- Simple Past Tense
puzzle Irregular Verbs- Simple Past Tensepuzzle Irregular Verbs- Simple Past Tense
puzzle Irregular Verbs- Simple Past Tense
OlgaLeonorTorresSnch
 
Botany Assignment Help Guide - Academic Excellence
Botany Assignment Help Guide - Academic ExcellenceBotany Assignment Help Guide - Academic Excellence
Botany Assignment Help Guide - Academic Excellence
online college homework help
 
Myasthenia gravis (Neuromuscular disorder)
Myasthenia gravis (Neuromuscular disorder)Myasthenia gravis (Neuromuscular disorder)
Myasthenia gravis (Neuromuscular disorder)
Mohamed Rizk Khodair
 
Final Evaluation.docx...........................
Final Evaluation.docx...........................Final Evaluation.docx...........................
Final Evaluation.docx...........................
l1bbyburrell
 
MCQ PHYSIOLOGY II (DR. NASIR MUSTAFA) MCQS)
MCQ PHYSIOLOGY II (DR. NASIR MUSTAFA) MCQS)MCQ PHYSIOLOGY II (DR. NASIR MUSTAFA) MCQS)
MCQ PHYSIOLOGY II (DR. NASIR MUSTAFA) MCQS)
Dr. Nasir Mustafa
 
Cyber security COPA ITI MCQ Top Questions
Cyber security COPA ITI MCQ Top QuestionsCyber security COPA ITI MCQ Top Questions
Cyber security COPA ITI MCQ Top Questions
SONU HEETSON
 
ANTI-VIRAL DRUGS unit 3 Pharmacology 3.pptx
ANTI-VIRAL DRUGS unit 3 Pharmacology 3.pptxANTI-VIRAL DRUGS unit 3 Pharmacology 3.pptx
ANTI-VIRAL DRUGS unit 3 Pharmacology 3.pptx
Mayuri Chavan
 
Unit 5 ACUTE, SUBACUTE,CHRONIC TOXICITY.pptx
Unit 5 ACUTE, SUBACUTE,CHRONIC TOXICITY.pptxUnit 5 ACUTE, SUBACUTE,CHRONIC TOXICITY.pptx
Unit 5 ACUTE, SUBACUTE,CHRONIC TOXICITY.pptx
Mayuri Chavan
 
How to Share Accounts Between Companies in Odoo 18
How to Share Accounts Between Companies in Odoo 18How to Share Accounts Between Companies in Odoo 18
How to Share Accounts Between Companies in Odoo 18
Celine George
 
The role of wall art in interior designing
The role of wall art in interior designingThe role of wall art in interior designing
The role of wall art in interior designing
meghaark2110
 
Myopathies (muscle disorders) for undergraduate
Myopathies (muscle disorders) for undergraduateMyopathies (muscle disorders) for undergraduate
Myopathies (muscle disorders) for undergraduate
Mohamed Rizk Khodair
 
Module 1: Foundations of Research
Module 1: Foundations of ResearchModule 1: Foundations of Research
Module 1: Foundations of Research
drroxannekemp
 
"Heraldry Detective Project"- Coats of Arms and Mottos of "Ivanhoe" in Ivanho...
"Heraldry Detective Project"- Coats of Arms and Mottos of "Ivanhoe" in Ivanho..."Heraldry Detective Project"- Coats of Arms and Mottos of "Ivanhoe" in Ivanho...
"Heraldry Detective Project"- Coats of Arms and Mottos of "Ivanhoe" in Ivanho...
ruslana1975
 
Chemotherapy of Malignancy -Anticancer.pptx
Chemotherapy of Malignancy -Anticancer.pptxChemotherapy of Malignancy -Anticancer.pptx
Chemotherapy of Malignancy -Anticancer.pptx
Mayuri Chavan
 
CNS infections (encephalitis, meningitis & Brain abscess
CNS infections (encephalitis, meningitis & Brain abscessCNS infections (encephalitis, meningitis & Brain abscess
CNS infections (encephalitis, meningitis & Brain abscess
Mohamed Rizk Khodair
 
E-Filing_of_Income_Tax.pptx and concept of form 26AS
E-Filing_of_Income_Tax.pptx and concept of form 26ASE-Filing_of_Income_Tax.pptx and concept of form 26AS
E-Filing_of_Income_Tax.pptx and concept of form 26AS
Abinash Palangdar
 
libbys peer assesment.docx..............
libbys peer assesment.docx..............libbys peer assesment.docx..............
libbys peer assesment.docx..............
19lburrell
 
puzzle Irregular Verbs- Simple Past Tense
puzzle Irregular Verbs- Simple Past Tensepuzzle Irregular Verbs- Simple Past Tense
puzzle Irregular Verbs- Simple Past Tense
OlgaLeonorTorresSnch
 
Botany Assignment Help Guide - Academic Excellence
Botany Assignment Help Guide - Academic ExcellenceBotany Assignment Help Guide - Academic Excellence
Botany Assignment Help Guide - Academic Excellence
online college homework help
 
Myasthenia gravis (Neuromuscular disorder)
Myasthenia gravis (Neuromuscular disorder)Myasthenia gravis (Neuromuscular disorder)
Myasthenia gravis (Neuromuscular disorder)
Mohamed Rizk Khodair
 
Final Evaluation.docx...........................
Final Evaluation.docx...........................Final Evaluation.docx...........................
Final Evaluation.docx...........................
l1bbyburrell
 

Machine learning algorithms for data mining

  • 1. Machine Learning Methods for Data Mining Based on- Data Mining: Concepts and Techniques Han, Kamber & Pei A.B.M. Ashikur Rahman Asst. Professor, Dept. of CSE, IUT
  • 2. Data Mining Knowledge Discovery from Data (KDD) process steps- • Data Cleaning • Data Integration • Data Selection • Data Transformation • Pattern Mining • Pattern Evaluation • Knowledge Representation e.g.- Frequent itemsets, Association rule (Strong/week)
  • 3. 3 Supervised vs. Unsupervised Learning • Supervised learning (classification) • Supervision: The training data (observations, measurements, etc.) are accompanied by labels indicating the class of the observations • New data is classified based on the training set • Unsupervised learning (clustering) • The class labels of training data is unknown • Given a set of measurements, observations, etc. with the aim of establishing the existence of classes or clusters in the data
  • 4. 4 Classification vs. Numeric Prediction • Classification • predicts categorical class labels (discrete or nominal) • classifies data (constructs a model) based on the training set and the values (class labels) in a classifying attribute and uses it in classifying new data • Numeric Prediction • models continuous-valued functions, i.e., predicts unknown or missing values • Typical applications • Credit/loan approval: • Medical diagnosis: if a tumor is cancerous or benign • Fraud detection: if a transaction is fraudulent • Web page categorization: which category it is Prediction Problems:
  • 5. 5 Classification—A Two-Step Process • Model construction: describing a set of predetermined classes • Each tuple/sample is assumed to belong to a predefined class, as determined by the class label attribute • The set of tuples used for model construction is training set • The model is represented as classification rules, decision trees, or mathematical formulae • Model usage: for classifying future or unknown objects • Estimate accuracy of the model • The known label of test sample is compared with the classified result from the model • Accuracy rate is the percentage of test set samples that are correctly classified by the model • Test set is independent of training set (otherwise overfitting) • If the accuracy is acceptable, use the model to classify new data • Note: If the test set is used to select models, it is called validation (test) set
  • 6. 6 Process (1): Model Construction Training Data NAME RANK YEARS TENURED Mike Assistant Prof 3 no Mary Assistant Prof 7 yes Bill Professor 2 yes Jim Associate Prof 7 yes Dave Assistant Prof 6 no Anne Associate Prof 3 no Classification Algorithms IF rank = ‘professor’ OR years > 6 THEN tenured = ‘yes’ Classifier (Model)
  • 7. 7 Process (2): Using the Model in Prediction Classifier Testing Data NAME RANK YEARS TENURED Tom Assistant Prof 2 no Merlisa Associate Prof 7 no George Professor 5 yes Joseph Assistant Prof 7 yes Unseen Data (Jeff, Professor, 4) Tenured?
  • 8. Classification Methods • Decision Tree Induction • Naïve Bayesian Classification • Rule based Classification • Bayesian Belief Network • Support Vector Machine (SVM) etc.
  • 9. 9 What is Cluster Analysis? • Cluster: A collection of data objects • similar (or related) to one another within the same group • dissimilar (or unrelated) to the objects in other groups • Cluster analysis (or clustering, data segmentation, …) • Finding similarities between data according to the characteristics found in the data and grouping similar data objects into clusters • Unsupervised learning: no predefined classes (i.e., learning by observations vs. learning by examples: supervised) • Typical applications • As a stand-alone tool to get insight into data distribution • As a preprocessing step for other algorithms
  • 10. 10 Clustering for Data Understanding and Applications • Biology: taxonomy of living things: kingdom, phylum, class, order, family, genus and species • Information retrieval: document clustering • Land use: Identification of areas of similar land use in an earth observation database • Marketing: Help marketers discover distinct groups in their customer bases, and then use this knowledge to develop targeted marketing programs • City-planning: Identifying groups of houses according to their house type, value, and geographical location • Earth-quake studies: Observed earth quake epicenters should be clustered along continent faults • Climate: understanding earth climate, find patterns of atmospheric and ocean • Economic Science: market resarch
  • 11. 11 Clustering as a Preprocessing Tool (Utility) • Summarization: • Preprocessing for regression, PCA, classification, and association analysis • Compression: • Image processing: vector quantization • Finding K-nearest Neighbors • Localizing search to one or a small number of clusters • Outlier detection • Outliers are often viewed as those “far away” from any cluster
  • 12. Quality: What Is Good Clustering? • A good clustering method will produce high quality clusters • high intra-class similarity: cohesive within clusters • low inter-class similarity: distinctive between clusters • The quality of a clustering method depends on • the similarity measure used by the method • its implementation, and • Its ability to discover some or all of the hidden patterns 12
  • 13. Measure the Quality of Clustering • Dissimilarity/Similarity metric • Similarity is expressed in terms of a distance function, typically metric: d(i, j) • The definitions of distance functions are usually rather different for interval- scaled, boolean, categorical, ordinal ratio, and vector variables • Weights should be associated with different variables based on applications and data semantics • Quality of clustering: • There is usually a separate “quality” function that measures the “goodness” of a cluster. • It is hard to define “similar enough” or “good enough” • The answer is typically highly subjective 13
  • 14. Major Clustering Approaches (I) • Partitioning approach: • Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors • Typical methods: k-means, k-medoids, CLARANS • Hierarchical approach: • Create a hierarchical decomposition of the set of data (or objects) using some criterion • Typical methods: Diana, Agnes, BIRCH, CAMELEON • Density-based approach: • Based on connectivity and density functions • Typical methods: DBSACN, OPTICS, DenClue • Grid-based approach: • based on a multiple-level granularity structure • Typical methods: STING, WaveCluster, CLIQUE 14
  翻译: