SlideShare a Scribd company logo
Introduction to Datamining
using Practical View
Created : Ngô Tùng Sơn
Part 1
Schedule:
1. Example of Datamining
2. What and Where is Datamining in the System
3. Datamining Techniques
 Data preprocessing
 Data Analysis
 Data Visualization
How data look like?
X Y
3 3
3 1
2 2
4 6
2 3
6 7
7 5
5 6
Can we get some thing from this?
The row represents
an object and its
columns represent
its attributes
Ex: can we identify the group of these objects? YES
1. Example of Datamining
Now, forget the table, consider a row as a point then we have
0
2
4
6
8
0 2 4 6 8
X
Y
B
A
C
From each data point, we find its neighbors by scanning with a radius r .
For Example : A will have 2 Neighbors B and C , denoted: A{B,C}
r
D
A and D have same neighbors so they are considered as neighbors
Same for B {A,B,C,D} ,C{A,B,C,D}, D{B,C}
The points have neighborhood will be in the same group.
1. Example of Datamining
Finally we have 2 groups after considering all points
0
2
4
6
8
0 2 4 6 8
X
Y
What do we see here?
Data has not been classified into groups but we now have the groups
This is just an example of technique called CLUSTERING in DATAMINING
1. Example of Datamining
2. What and Where is Datamining in the System
So. What exactly is Datamining?
Datamining is the set of tools and techniques to retrieve
hidden Knowledge/Rules from data
The name of datamining could make us to misunderstand
Data was there, we do not need to ‘mining’ it
For ore mining you need hammers and shovels 
However, for datamining you need mathematic, statistic and
probability, machine learning, computer programming,
database techniques,...
2. What and Where is Datamining in the System
Where is Datamining in the system?
Employee/Staff
Day by day, The staff using the software (Web/
Desktop/Mobile application) to generate data by recording
all of his/her business activities (customers, products,
order detail, contracts ,…) Database
Data is added to Database
Online transaction processing (OLTP)
Database
Database
….
Data from several data sources (OLTP) will be collected to a common repository
Data
warehouse
Integration
Service
Datamining service will access to the Data warehouse to process
Data Mining
3. Datamining Techniques
What are the techniques in Datamining?
There are so many techniques can be applied in datamining
Basically we can classify them into 3 groups / phases
Data-Preprocessing
Data Analysis
Data Presentation
3. Datamining Techniques
Data-Preprocessing
3. Datamining Techniques
We can understand that:
The quality of collected data would be not good.
It is necessary to clean / format / transform .... Before analyzing
This is very important process. It is very hard to find an
abstract way to describe.
Data-Preprocessing
Here we will see few examples of data pre-processing
techniques:
• Similarity Measure
• Down Sampling
• Dimension Reduction
• Vectorization
3. Datamining Techniques
How can we know which object are similar?
Data-Preprocessing Similarity Measure
A(x1,y1)
B(x2,y2)
C(x1,y1)
D2D1
Measure the distance between AB and AC
We see that D1 < D2 -> A is more similar with B than C
Every point can be represented as vector. Measure the angle between
pair of vectors: A and B, then A and C
We see that 𝜶 < 𝜷 -> A is more similar with B than C
𝜶
𝜷
3. Datamining Techniques
What if, you have so many data, performing data analysis on all
of them may be not necessary and reducing performance ?
Data-Preprocessing Down Sampling
Just pick some of them to evaluate
Example: using a cell-size of 𝑔. Keep only object / cell
𝑔
𝑔
Origin Data Down Sampling
3. Datamining Techniques
All example data have been presented to you are in 2
dimensions, 2 attributes (X,Y) . What if it was ~10.000 attributes
for each object
Data-Preprocessing Dimension Reduction
This could reduce the performance (and or accuracy) of data-
analysis algorithms . Somehow we need to reduce number of
dimensions
Principal component Analysis & Singular value Decomposition
are 2 of most effective methods to do this
3. Datamining Techniques
Data-Preprocessing Dimension Reduction - PCA
PCA
X
Y
𝑃1
𝑃2
Origin Data Data projected to Principal Components
We Only keep 𝑘 Principal Components that have highest eigenvalues. On above
example. We can let 𝑘 = 1 then keep 𝑃1 instead of both 𝑃1 , 𝑃2
By this way the number of dimensions has been reduced
3. Datamining Techniques
Data-Preprocessing Vectorization
Most of Data Analysis algorithms consider the input as set of
vectors, so we need to transform the collected data into set of
vectors.
Ex: Giving a document: “Mr A has not passed the exam this
year. He will do it again next year”
Some of important words will be extracted like “Mr A” , “not” ,
“pass” ,”exam” , “again” , “next” , “year”
Measure the frequency of each word, we get the vector that
represent the document
Mr A not pass exam again next year
1 1 1 1 1 1 2
3. Datamining Techniques
Data Analysis
3. Datamining Techniques
There are so many techniques in this phase:
• Clustering
• Classification
• Regression
• Rule Bases
• ….
This is the most important phase, where we find all of
hidden knowledge/ rules in the data
Data Analysis
3. Datamining Techniques
The process of clustering is to find ways to group objects
into groups (clusters)
Data Analysis Clustering
The objects in the same cluster are similar and otherwise
they are not similar.
There are 2 types of clustering : Partional & Hierarchical
In this presentation: we see an example of the most famous
clustering method : K-Mean
3. Datamining Techniques
Data Analysis Clustering – K mean Algorithm
1. Randomly select K center (centroid) for K clusters (cluster).
2. Calculate the distance between objects (objects) to the K center
3. Group objects to the nearest group
4. Defining the new focus for the group
5. Repeat step 2 until no change of subject groups
3. Datamining Techniques
Data Analysis Clustering – K mean Algorithm
Consider the below data
Plot them we have:
3. Datamining Techniques
Data Analysis Clustering – K mean Algorithm
Select K=2 centroids Compute the new position of
centroids
Finally centroids stop changing
The object belongs to the group of
its closest centroid
The key point of algorithm is to
select a good k
3. Datamining Techniques
Data Analysis Classification
How can we identify the group of unclassified object ?
Sure! we can perform clustering to do this.
However, what if we know some of classified objects in
the past? Can we do better than Clustering? YES.
We can construct a prediction model to predict the group
of unclassified objects based on the classified objects
This process called CLASSIFICATION
3. Datamining Techniques
Data Analysis Classification
The process of Classification can be described as below
Learning
Algorithm
Model
3. Datamining Techniques
Data Analysis Classification - SVM
Support Vector Machine (SVM) is one of famous classification
method. It belongs to group of linear classifiers
For example: data classified in red and blue Training Data
𝑤 : normal vector
𝑏 : bias / distance from the line to origin
?
𝑥
𝑦 𝑤 + 𝑏 > 0 → blue
Classification Model?
𝑥
𝑦 𝑤 + 𝑏 < 0 → red
3. Datamining Techniques
Data Analysis Regression
Use for prediction: but to predict the missing value of an
attribute
For example:
Y
X𝑥𝑖
𝑦𝑖
• How to find 𝑦𝑖 , if 𝑥𝑖 known?
• We can estimate the line
that describe The data
• Plug 𝑥𝑖 to line equation to
Find 𝑦𝑖
• This is just an example of
Linear Regression
3. Datamining Techniques
Data Analysis Rule Base
Rule Base techniques : to find hidden patterns in the data
Example of rule base techniques:
• Customer normally buy rice always buy vegetable
• Young people want to more expensive phone than others
• People always buy laptop before buying cell-phone
Frequent Pattern
Gradual Pattern
Sequential Pattern
3. Datamining Techniques
Data Visualization
3. Datamining Techniques
Data Visualization
Techniques to present knowledge that you retrieved to user
0
2
4
6
8
10
12
14
Series 3
Series 2
Series 1
Series 1 Series 2 Series 3
Category
1 4.3 2.4 2
Category
2 2.5 4.4 2
Category
3 3.5 1.8 3
Category
4 4.5 2.8 5
Thank you for your attention
Ad

More Related Content

What's hot (20)

Introduction to Data mining
Introduction to Data miningIntroduction to Data mining
Introduction to Data mining
Hadi Fadlallah
 
Data mining
Data miningData mining
Data mining
Hoang Nguyen
 
Data mining an introduction
Data mining an introductionData mining an introduction
Data mining an introduction
Dr-Dipali Meher
 
Chapter 08 Data Mining Techniques
Chapter 08 Data Mining Techniques Chapter 08 Data Mining Techniques
Chapter 08 Data Mining Techniques
Houw Liong The
 
introduction to data mining tutorial
introduction to data mining tutorial introduction to data mining tutorial
introduction to data mining tutorial
Salah Amean
 
Knowledge discovery thru data mining
Knowledge discovery thru data miningKnowledge discovery thru data mining
Knowledge discovery thru data mining
Devakumar Jain
 
An introduction to data mining and its techniques
An introduction to data mining and its techniquesAn introduction to data mining and its techniques
An introduction to data mining and its techniques
Sandhya Tarwani
 
Data Mining
Data MiningData Mining
Data Mining
solairajAnandappan
 
01 Introduction to Data Mining
01 Introduction to Data Mining01 Introduction to Data Mining
01 Introduction to Data Mining
Valerii Klymchuk
 
Introduction To Data Mining
Introduction To Data Mining   Introduction To Data Mining
Introduction To Data Mining
Phi Jack
 
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; KamberChapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
error007
 
Data mining
Data mining Data mining
Data mining
AthiraR23
 
Data mining
Data miningData mining
Data mining
pradeepa n
 
Classification and prediction in data mining
Classification and prediction in data miningClassification and prediction in data mining
Classification and prediction in data mining
Er. Nawaraj Bhandari
 
Additional themes of data mining for Msc CS
Additional themes of data mining for Msc CSAdditional themes of data mining for Msc CS
Additional themes of data mining for Msc CS
Thanveen
 
Data Mining
Data MiningData Mining
Data Mining
ksanthosh
 
Data mining and its applications!
Data mining and its applications!Data mining and its applications!
Data mining and its applications!
COSTARCH Analytical Consulting (P) Ltd.
 
Data mining
Data miningData mining
Data mining
Daminda Herath
 
3 Data Mining Tasks
3  Data Mining Tasks3  Data Mining Tasks
3 Data Mining Tasks
Mahmoud Alfarra
 
Key Principles Of Data Mining
Key Principles Of Data MiningKey Principles Of Data Mining
Key Principles Of Data Mining
tobiemuir
 
Introduction to Data mining
Introduction to Data miningIntroduction to Data mining
Introduction to Data mining
Hadi Fadlallah
 
Data mining an introduction
Data mining an introductionData mining an introduction
Data mining an introduction
Dr-Dipali Meher
 
Chapter 08 Data Mining Techniques
Chapter 08 Data Mining Techniques Chapter 08 Data Mining Techniques
Chapter 08 Data Mining Techniques
Houw Liong The
 
introduction to data mining tutorial
introduction to data mining tutorial introduction to data mining tutorial
introduction to data mining tutorial
Salah Amean
 
Knowledge discovery thru data mining
Knowledge discovery thru data miningKnowledge discovery thru data mining
Knowledge discovery thru data mining
Devakumar Jain
 
An introduction to data mining and its techniques
An introduction to data mining and its techniquesAn introduction to data mining and its techniques
An introduction to data mining and its techniques
Sandhya Tarwani
 
01 Introduction to Data Mining
01 Introduction to Data Mining01 Introduction to Data Mining
01 Introduction to Data Mining
Valerii Klymchuk
 
Introduction To Data Mining
Introduction To Data Mining   Introduction To Data Mining
Introduction To Data Mining
Phi Jack
 
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; KamberChapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
error007
 
Data mining
Data mining Data mining
Data mining
AthiraR23
 
Classification and prediction in data mining
Classification and prediction in data miningClassification and prediction in data mining
Classification and prediction in data mining
Er. Nawaraj Bhandari
 
Additional themes of data mining for Msc CS
Additional themes of data mining for Msc CSAdditional themes of data mining for Msc CS
Additional themes of data mining for Msc CS
Thanveen
 
Key Principles Of Data Mining
Key Principles Of Data MiningKey Principles Of Data Mining
Key Principles Of Data Mining
tobiemuir
 

Viewers also liked (20)

Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
snoreen
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
Jason Rodrigues
 
Data mining slides
Data mining slidesData mining slides
Data mining slides
smj
 
Data mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesData mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniques
Saif Ullah
 
Approaches to Mining Large-Scale Heterogeneous Data: Old and New
Approaches to Mining Large-Scale Heterogeneous Data: Old and NewApproaches to Mining Large-Scale Heterogeneous Data: Old and New
Approaches to Mining Large-Scale Heterogeneous Data: Old and New
Center for Transportation Research - UT Austin
 
Ethics In DW &amp; DM
Ethics In DW &amp; DMEthics In DW &amp; DM
Ethics In DW &amp; DM
abethan
 
Mining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsMining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and Correlations
Justin Cletus
 
Datamining with R
Datamining with RDatamining with R
Datamining with R
Shitalkumar Sukhdeve
 
Digital footprints& datamining
Digital footprints& dataminingDigital footprints& datamining
Digital footprints& datamining
Paige Jaeger
 
Data Mining using Weka
Data Mining using WekaData Mining using Weka
Data Mining using Weka
Shashidhar Shenoy
 
Data mining project presentation
Data mining project presentationData mining project presentation
Data mining project presentation
Kaiwen Qi
 
Datamining
DataminingDatamining
Datamining
Yaman Çakmaklar
 
Data mining with big data implementation
Data mining with big data implementationData mining with big data implementation
Data mining with big data implementation
Sandip Tipayle Patil
 
Introduction-to-Knowledge Discovery in Database
Introduction-to-Knowledge Discovery in DatabaseIntroduction-to-Knowledge Discovery in Database
Introduction-to-Knowledge Discovery in Database
Kartik Kalpande Patil
 
Application of data mining
Application of data miningApplication of data mining
Application of data mining
SHIVANI SONI
 
Knowledge Discovery in Databases
Knowledge Discovery in DatabasesKnowledge Discovery in Databases
Knowledge Discovery in Databases
Diwas Kandel
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
Sandip Tipayle Patil
 
Weka presentation
Weka presentationWeka presentation
Weka presentation
Saeed Iqbal
 
Kdd process
Kdd processKdd process
Kdd process
Rajesh Chandra
 
2.5 backpropagation
2.5 backpropagation2.5 backpropagation
2.5 backpropagation
Krish_ver2
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
snoreen
 
Data mining slides
Data mining slidesData mining slides
Data mining slides
smj
 
Data mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesData mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniques
Saif Ullah
 
Ethics In DW &amp; DM
Ethics In DW &amp; DMEthics In DW &amp; DM
Ethics In DW &amp; DM
abethan
 
Mining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsMining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and Correlations
Justin Cletus
 
Digital footprints& datamining
Digital footprints& dataminingDigital footprints& datamining
Digital footprints& datamining
Paige Jaeger
 
Data mining project presentation
Data mining project presentationData mining project presentation
Data mining project presentation
Kaiwen Qi
 
Data mining with big data implementation
Data mining with big data implementationData mining with big data implementation
Data mining with big data implementation
Sandip Tipayle Patil
 
Introduction-to-Knowledge Discovery in Database
Introduction-to-Knowledge Discovery in DatabaseIntroduction-to-Knowledge Discovery in Database
Introduction-to-Knowledge Discovery in Database
Kartik Kalpande Patil
 
Application of data mining
Application of data miningApplication of data mining
Application of data mining
SHIVANI SONI
 
Knowledge Discovery in Databases
Knowledge Discovery in DatabasesKnowledge Discovery in Databases
Knowledge Discovery in Databases
Diwas Kandel
 
Weka presentation
Weka presentationWeka presentation
Weka presentation
Saeed Iqbal
 
2.5 backpropagation
2.5 backpropagation2.5 backpropagation
2.5 backpropagation
Krish_ver2
 
Ad

Similar to Introduction to Datamining Concept and Techniques (20)

Data analytcis-first-steps
Data analytcis-first-stepsData analytcis-first-steps
Data analytcis-first-steps
Shesha R
 
5. Machine Learning.pptx
5.  Machine Learning.pptx5.  Machine Learning.pptx
5. Machine Learning.pptx
ssuser6654de1
 
Cssu dw dm
Cssu dw dmCssu dw dm
Cssu dw dm
sumit621
 
Machine_Learning_VTU_6th_Semester_Module_2.1.pptx
Machine_Learning_VTU_6th_Semester_Module_2.1.pptxMachine_Learning_VTU_6th_Semester_Module_2.1.pptx
Machine_Learning_VTU_6th_Semester_Module_2.1.pptx
MaheshKini3
 
Machine_Learning_Trushita
Machine_Learning_TrushitaMachine_Learning_Trushita
Machine_Learning_Trushita
Trushita Redij
 
EDAB Module 5 Singular Value Decomposition (SVD).pptx
EDAB Module 5 Singular Value Decomposition (SVD).pptxEDAB Module 5 Singular Value Decomposition (SVD).pptx
EDAB Module 5 Singular Value Decomposition (SVD).pptx
rajalakshmi5921
 
Singular Value Decomposition (SVD).pptx
Singular Value Decomposition (SVD).pptxSingular Value Decomposition (SVD).pptx
Singular Value Decomposition (SVD).pptx
rajalakshmi5921
 
Machine Learning.pptx
Machine Learning.pptxMachine Learning.pptx
Machine Learning.pptx
NitinSharma134320
 
Current clustering techniques
Current clustering techniquesCurrent clustering techniques
Current clustering techniques
Poonam Kshirsagar
 
Intro to Data warehousing lecture 17
Intro to Data warehousing   lecture 17Intro to Data warehousing   lecture 17
Intro to Data warehousing lecture 17
AnwarrChaudary
 
Data Reduction
Data ReductionData Reduction
Data Reduction
Rajan Shah
 
07 learning
07 learning07 learning
07 learning
ankit_ppt
 
Identifying and classifying unknown Network Disruption
Identifying and classifying unknown Network DisruptionIdentifying and classifying unknown Network Disruption
Identifying and classifying unknown Network Disruption
jagan477830
 
Introduction to data mining
Introduction to data miningIntroduction to data mining
Introduction to data mining
Ujjawal
 
EE-232-LEC-01 Data_structures.pptx
EE-232-LEC-01 Data_structures.pptxEE-232-LEC-01 Data_structures.pptx
EE-232-LEC-01 Data_structures.pptx
iamultapromax
 
Machine learning introduction
Machine learning introductionMachine learning introduction
Machine learning introduction
Anas Jamil
 
IRJET- A Detailed Study on Classification Techniques for Data Mining
IRJET- A Detailed Study on Classification Techniques for Data MiningIRJET- A Detailed Study on Classification Techniques for Data Mining
IRJET- A Detailed Study on Classification Techniques for Data Mining
IRJET Journal
 
Data1
Data1Data1
Data1
suganmca14
 
Data1
Data1Data1
Data1
suganmca14
 
Data reduction
Data reductionData reduction
Data reduction
GowriLatha1
 
Data analytcis-first-steps
Data analytcis-first-stepsData analytcis-first-steps
Data analytcis-first-steps
Shesha R
 
5. Machine Learning.pptx
5.  Machine Learning.pptx5.  Machine Learning.pptx
5. Machine Learning.pptx
ssuser6654de1
 
Cssu dw dm
Cssu dw dmCssu dw dm
Cssu dw dm
sumit621
 
Machine_Learning_VTU_6th_Semester_Module_2.1.pptx
Machine_Learning_VTU_6th_Semester_Module_2.1.pptxMachine_Learning_VTU_6th_Semester_Module_2.1.pptx
Machine_Learning_VTU_6th_Semester_Module_2.1.pptx
MaheshKini3
 
Machine_Learning_Trushita
Machine_Learning_TrushitaMachine_Learning_Trushita
Machine_Learning_Trushita
Trushita Redij
 
EDAB Module 5 Singular Value Decomposition (SVD).pptx
EDAB Module 5 Singular Value Decomposition (SVD).pptxEDAB Module 5 Singular Value Decomposition (SVD).pptx
EDAB Module 5 Singular Value Decomposition (SVD).pptx
rajalakshmi5921
 
Singular Value Decomposition (SVD).pptx
Singular Value Decomposition (SVD).pptxSingular Value Decomposition (SVD).pptx
Singular Value Decomposition (SVD).pptx
rajalakshmi5921
 
Current clustering techniques
Current clustering techniquesCurrent clustering techniques
Current clustering techniques
Poonam Kshirsagar
 
Intro to Data warehousing lecture 17
Intro to Data warehousing   lecture 17Intro to Data warehousing   lecture 17
Intro to Data warehousing lecture 17
AnwarrChaudary
 
Data Reduction
Data ReductionData Reduction
Data Reduction
Rajan Shah
 
Identifying and classifying unknown Network Disruption
Identifying and classifying unknown Network DisruptionIdentifying and classifying unknown Network Disruption
Identifying and classifying unknown Network Disruption
jagan477830
 
Introduction to data mining
Introduction to data miningIntroduction to data mining
Introduction to data mining
Ujjawal
 
EE-232-LEC-01 Data_structures.pptx
EE-232-LEC-01 Data_structures.pptxEE-232-LEC-01 Data_structures.pptx
EE-232-LEC-01 Data_structures.pptx
iamultapromax
 
Machine learning introduction
Machine learning introductionMachine learning introduction
Machine learning introduction
Anas Jamil
 
IRJET- A Detailed Study on Classification Techniques for Data Mining
IRJET- A Detailed Study on Classification Techniques for Data MiningIRJET- A Detailed Study on Classification Techniques for Data Mining
IRJET- A Detailed Study on Classification Techniques for Data Mining
IRJET Journal
 
Ad

Recently uploaded (20)

YSPH VMOC Special Report - Measles Outbreak Southwest US 5-17-2025 .pptx
YSPH VMOC Special Report - Measles Outbreak  Southwest US 5-17-2025  .pptxYSPH VMOC Special Report - Measles Outbreak  Southwest US 5-17-2025  .pptx
YSPH VMOC Special Report - Measles Outbreak Southwest US 5-17-2025 .pptx
Yale School of Public Health - The Virtual Medical Operations Center (VMOC)
 
2025 The Senior Landscape and SET plan preparations.pptx
2025 The Senior Landscape and SET plan preparations.pptx2025 The Senior Landscape and SET plan preparations.pptx
2025 The Senior Landscape and SET plan preparations.pptx
mansk2
 
U3 ANTITUBERCULAR DRUGS Pharmacology 3.pptx
U3 ANTITUBERCULAR DRUGS Pharmacology 3.pptxU3 ANTITUBERCULAR DRUGS Pharmacology 3.pptx
U3 ANTITUBERCULAR DRUGS Pharmacology 3.pptx
Mayuri Chavan
 
How to Manage Amounts in Local Currency in Odoo 18 Purchase
How to Manage Amounts in Local Currency in Odoo 18 PurchaseHow to Manage Amounts in Local Currency in Odoo 18 Purchase
How to Manage Amounts in Local Currency in Odoo 18 Purchase
Celine George
 
How to Manage Cross Selling in Odoo 18 Sales
How to Manage Cross Selling in Odoo 18 SalesHow to Manage Cross Selling in Odoo 18 Sales
How to Manage Cross Selling in Odoo 18 Sales
Celine George
 
Final Evaluation.docx...........................
Final Evaluation.docx...........................Final Evaluation.docx...........................
Final Evaluation.docx...........................
l1bbyburrell
 
ANTI-VIRAL DRUGS unit 3 Pharmacology 3.pptx
ANTI-VIRAL DRUGS unit 3 Pharmacology 3.pptxANTI-VIRAL DRUGS unit 3 Pharmacology 3.pptx
ANTI-VIRAL DRUGS unit 3 Pharmacology 3.pptx
Mayuri Chavan
 
YSPH VMOC Special Report - Measles Outbreak Southwest US 5-14-2025 .pptx
YSPH VMOC Special Report - Measles Outbreak  Southwest US 5-14-2025  .pptxYSPH VMOC Special Report - Measles Outbreak  Southwest US 5-14-2025  .pptx
YSPH VMOC Special Report - Measles Outbreak Southwest US 5-14-2025 .pptx
Yale School of Public Health - The Virtual Medical Operations Center (VMOC)
 
Cyber security COPA ITI MCQ Top Questions
Cyber security COPA ITI MCQ Top QuestionsCyber security COPA ITI MCQ Top Questions
Cyber security COPA ITI MCQ Top Questions
SONU HEETSON
 
Module 1: Foundations of Research
Module 1: Foundations of ResearchModule 1: Foundations of Research
Module 1: Foundations of Research
drroxannekemp
 
Module_2_Types_and_Approaches_of_Research (2).pptx
Module_2_Types_and_Approaches_of_Research (2).pptxModule_2_Types_and_Approaches_of_Research (2).pptx
Module_2_Types_and_Approaches_of_Research (2).pptx
drroxannekemp
 
How to Share Accounts Between Companies in Odoo 18
How to Share Accounts Between Companies in Odoo 18How to Share Accounts Between Companies in Odoo 18
How to Share Accounts Between Companies in Odoo 18
Celine George
 
libbys peer assesment.docx..............
libbys peer assesment.docx..............libbys peer assesment.docx..............
libbys peer assesment.docx..............
19lburrell
 
Botany Assignment Help Guide - Academic Excellence
Botany Assignment Help Guide - Academic ExcellenceBotany Assignment Help Guide - Academic Excellence
Botany Assignment Help Guide - Academic Excellence
online college homework help
 
Conditions for Boltzmann Law – Biophysics Lecture Slide
Conditions for Boltzmann Law – Biophysics Lecture SlideConditions for Boltzmann Law – Biophysics Lecture Slide
Conditions for Boltzmann Law – Biophysics Lecture Slide
PKLI-Institute of Nursing and Allied Health Sciences Lahore , Pakistan.
 
MICROBIAL GENETICS -tranformation and tranduction.pdf
MICROBIAL GENETICS -tranformation and tranduction.pdfMICROBIAL GENETICS -tranformation and tranduction.pdf
MICROBIAL GENETICS -tranformation and tranduction.pdf
DHARMENDRA SAHU
 
Peer Assesment- Libby.docx..............
Peer Assesment- Libby.docx..............Peer Assesment- Libby.docx..............
Peer Assesment- Libby.docx..............
19lburrell
 
Antepartum fetal surveillance---Dr. H.K.Cheema pdf.pdf
Antepartum fetal surveillance---Dr. H.K.Cheema pdf.pdfAntepartum fetal surveillance---Dr. H.K.Cheema pdf.pdf
Antepartum fetal surveillance---Dr. H.K.Cheema pdf.pdf
Dr H.K. Cheema
 
The role of wall art in interior designing
The role of wall art in interior designingThe role of wall art in interior designing
The role of wall art in interior designing
meghaark2110
 
Classification of mental disorder in 5th semester bsc. nursing and also used ...
Classification of mental disorder in 5th semester bsc. nursing and also used ...Classification of mental disorder in 5th semester bsc. nursing and also used ...
Classification of mental disorder in 5th semester bsc. nursing and also used ...
parmarjuli1412
 
2025 The Senior Landscape and SET plan preparations.pptx
2025 The Senior Landscape and SET plan preparations.pptx2025 The Senior Landscape and SET plan preparations.pptx
2025 The Senior Landscape and SET plan preparations.pptx
mansk2
 
U3 ANTITUBERCULAR DRUGS Pharmacology 3.pptx
U3 ANTITUBERCULAR DRUGS Pharmacology 3.pptxU3 ANTITUBERCULAR DRUGS Pharmacology 3.pptx
U3 ANTITUBERCULAR DRUGS Pharmacology 3.pptx
Mayuri Chavan
 
How to Manage Amounts in Local Currency in Odoo 18 Purchase
How to Manage Amounts in Local Currency in Odoo 18 PurchaseHow to Manage Amounts in Local Currency in Odoo 18 Purchase
How to Manage Amounts in Local Currency in Odoo 18 Purchase
Celine George
 
How to Manage Cross Selling in Odoo 18 Sales
How to Manage Cross Selling in Odoo 18 SalesHow to Manage Cross Selling in Odoo 18 Sales
How to Manage Cross Selling in Odoo 18 Sales
Celine George
 
Final Evaluation.docx...........................
Final Evaluation.docx...........................Final Evaluation.docx...........................
Final Evaluation.docx...........................
l1bbyburrell
 
ANTI-VIRAL DRUGS unit 3 Pharmacology 3.pptx
ANTI-VIRAL DRUGS unit 3 Pharmacology 3.pptxANTI-VIRAL DRUGS unit 3 Pharmacology 3.pptx
ANTI-VIRAL DRUGS unit 3 Pharmacology 3.pptx
Mayuri Chavan
 
Cyber security COPA ITI MCQ Top Questions
Cyber security COPA ITI MCQ Top QuestionsCyber security COPA ITI MCQ Top Questions
Cyber security COPA ITI MCQ Top Questions
SONU HEETSON
 
Module 1: Foundations of Research
Module 1: Foundations of ResearchModule 1: Foundations of Research
Module 1: Foundations of Research
drroxannekemp
 
Module_2_Types_and_Approaches_of_Research (2).pptx
Module_2_Types_and_Approaches_of_Research (2).pptxModule_2_Types_and_Approaches_of_Research (2).pptx
Module_2_Types_and_Approaches_of_Research (2).pptx
drroxannekemp
 
How to Share Accounts Between Companies in Odoo 18
How to Share Accounts Between Companies in Odoo 18How to Share Accounts Between Companies in Odoo 18
How to Share Accounts Between Companies in Odoo 18
Celine George
 
libbys peer assesment.docx..............
libbys peer assesment.docx..............libbys peer assesment.docx..............
libbys peer assesment.docx..............
19lburrell
 
Botany Assignment Help Guide - Academic Excellence
Botany Assignment Help Guide - Academic ExcellenceBotany Assignment Help Guide - Academic Excellence
Botany Assignment Help Guide - Academic Excellence
online college homework help
 
MICROBIAL GENETICS -tranformation and tranduction.pdf
MICROBIAL GENETICS -tranformation and tranduction.pdfMICROBIAL GENETICS -tranformation and tranduction.pdf
MICROBIAL GENETICS -tranformation and tranduction.pdf
DHARMENDRA SAHU
 
Peer Assesment- Libby.docx..............
Peer Assesment- Libby.docx..............Peer Assesment- Libby.docx..............
Peer Assesment- Libby.docx..............
19lburrell
 
Antepartum fetal surveillance---Dr. H.K.Cheema pdf.pdf
Antepartum fetal surveillance---Dr. H.K.Cheema pdf.pdfAntepartum fetal surveillance---Dr. H.K.Cheema pdf.pdf
Antepartum fetal surveillance---Dr. H.K.Cheema pdf.pdf
Dr H.K. Cheema
 
The role of wall art in interior designing
The role of wall art in interior designingThe role of wall art in interior designing
The role of wall art in interior designing
meghaark2110
 
Classification of mental disorder in 5th semester bsc. nursing and also used ...
Classification of mental disorder in 5th semester bsc. nursing and also used ...Classification of mental disorder in 5th semester bsc. nursing and also used ...
Classification of mental disorder in 5th semester bsc. nursing and also used ...
parmarjuli1412
 

Introduction to Datamining Concept and Techniques

  • 1. Introduction to Datamining using Practical View Created : Ngô Tùng Sơn Part 1
  • 2. Schedule: 1. Example of Datamining 2. What and Where is Datamining in the System 3. Datamining Techniques  Data preprocessing  Data Analysis  Data Visualization
  • 3. How data look like? X Y 3 3 3 1 2 2 4 6 2 3 6 7 7 5 5 6 Can we get some thing from this? The row represents an object and its columns represent its attributes Ex: can we identify the group of these objects? YES 1. Example of Datamining
  • 4. Now, forget the table, consider a row as a point then we have 0 2 4 6 8 0 2 4 6 8 X Y B A C From each data point, we find its neighbors by scanning with a radius r . For Example : A will have 2 Neighbors B and C , denoted: A{B,C} r D A and D have same neighbors so they are considered as neighbors Same for B {A,B,C,D} ,C{A,B,C,D}, D{B,C} The points have neighborhood will be in the same group. 1. Example of Datamining
  • 5. Finally we have 2 groups after considering all points 0 2 4 6 8 0 2 4 6 8 X Y What do we see here? Data has not been classified into groups but we now have the groups This is just an example of technique called CLUSTERING in DATAMINING 1. Example of Datamining
  • 6. 2. What and Where is Datamining in the System So. What exactly is Datamining? Datamining is the set of tools and techniques to retrieve hidden Knowledge/Rules from data The name of datamining could make us to misunderstand Data was there, we do not need to ‘mining’ it For ore mining you need hammers and shovels  However, for datamining you need mathematic, statistic and probability, machine learning, computer programming, database techniques,...
  • 7. 2. What and Where is Datamining in the System Where is Datamining in the system? Employee/Staff Day by day, The staff using the software (Web/ Desktop/Mobile application) to generate data by recording all of his/her business activities (customers, products, order detail, contracts ,…) Database Data is added to Database Online transaction processing (OLTP) Database Database …. Data from several data sources (OLTP) will be collected to a common repository Data warehouse Integration Service Datamining service will access to the Data warehouse to process Data Mining
  • 8. 3. Datamining Techniques What are the techniques in Datamining? There are so many techniques can be applied in datamining Basically we can classify them into 3 groups / phases Data-Preprocessing Data Analysis Data Presentation
  • 10. 3. Datamining Techniques We can understand that: The quality of collected data would be not good. It is necessary to clean / format / transform .... Before analyzing This is very important process. It is very hard to find an abstract way to describe. Data-Preprocessing Here we will see few examples of data pre-processing techniques: • Similarity Measure • Down Sampling • Dimension Reduction • Vectorization
  • 11. 3. Datamining Techniques How can we know which object are similar? Data-Preprocessing Similarity Measure A(x1,y1) B(x2,y2) C(x1,y1) D2D1 Measure the distance between AB and AC We see that D1 < D2 -> A is more similar with B than C Every point can be represented as vector. Measure the angle between pair of vectors: A and B, then A and C We see that 𝜶 < 𝜷 -> A is more similar with B than C 𝜶 𝜷
  • 12. 3. Datamining Techniques What if, you have so many data, performing data analysis on all of them may be not necessary and reducing performance ? Data-Preprocessing Down Sampling Just pick some of them to evaluate Example: using a cell-size of 𝑔. Keep only object / cell 𝑔 𝑔 Origin Data Down Sampling
  • 13. 3. Datamining Techniques All example data have been presented to you are in 2 dimensions, 2 attributes (X,Y) . What if it was ~10.000 attributes for each object Data-Preprocessing Dimension Reduction This could reduce the performance (and or accuracy) of data- analysis algorithms . Somehow we need to reduce number of dimensions Principal component Analysis & Singular value Decomposition are 2 of most effective methods to do this
  • 14. 3. Datamining Techniques Data-Preprocessing Dimension Reduction - PCA PCA X Y 𝑃1 𝑃2 Origin Data Data projected to Principal Components We Only keep 𝑘 Principal Components that have highest eigenvalues. On above example. We can let 𝑘 = 1 then keep 𝑃1 instead of both 𝑃1 , 𝑃2 By this way the number of dimensions has been reduced
  • 15. 3. Datamining Techniques Data-Preprocessing Vectorization Most of Data Analysis algorithms consider the input as set of vectors, so we need to transform the collected data into set of vectors. Ex: Giving a document: “Mr A has not passed the exam this year. He will do it again next year” Some of important words will be extracted like “Mr A” , “not” , “pass” ,”exam” , “again” , “next” , “year” Measure the frequency of each word, we get the vector that represent the document Mr A not pass exam again next year 1 1 1 1 1 1 2
  • 17. 3. Datamining Techniques There are so many techniques in this phase: • Clustering • Classification • Regression • Rule Bases • …. This is the most important phase, where we find all of hidden knowledge/ rules in the data Data Analysis
  • 18. 3. Datamining Techniques The process of clustering is to find ways to group objects into groups (clusters) Data Analysis Clustering The objects in the same cluster are similar and otherwise they are not similar. There are 2 types of clustering : Partional & Hierarchical In this presentation: we see an example of the most famous clustering method : K-Mean
  • 19. 3. Datamining Techniques Data Analysis Clustering – K mean Algorithm 1. Randomly select K center (centroid) for K clusters (cluster). 2. Calculate the distance between objects (objects) to the K center 3. Group objects to the nearest group 4. Defining the new focus for the group 5. Repeat step 2 until no change of subject groups
  • 20. 3. Datamining Techniques Data Analysis Clustering – K mean Algorithm Consider the below data Plot them we have:
  • 21. 3. Datamining Techniques Data Analysis Clustering – K mean Algorithm Select K=2 centroids Compute the new position of centroids Finally centroids stop changing The object belongs to the group of its closest centroid The key point of algorithm is to select a good k
  • 22. 3. Datamining Techniques Data Analysis Classification How can we identify the group of unclassified object ? Sure! we can perform clustering to do this. However, what if we know some of classified objects in the past? Can we do better than Clustering? YES. We can construct a prediction model to predict the group of unclassified objects based on the classified objects This process called CLASSIFICATION
  • 23. 3. Datamining Techniques Data Analysis Classification The process of Classification can be described as below Learning Algorithm Model
  • 24. 3. Datamining Techniques Data Analysis Classification - SVM Support Vector Machine (SVM) is one of famous classification method. It belongs to group of linear classifiers For example: data classified in red and blue Training Data 𝑤 : normal vector 𝑏 : bias / distance from the line to origin ? 𝑥 𝑦 𝑤 + 𝑏 > 0 → blue Classification Model? 𝑥 𝑦 𝑤 + 𝑏 < 0 → red
  • 25. 3. Datamining Techniques Data Analysis Regression Use for prediction: but to predict the missing value of an attribute For example: Y X𝑥𝑖 𝑦𝑖 • How to find 𝑦𝑖 , if 𝑥𝑖 known? • We can estimate the line that describe The data • Plug 𝑥𝑖 to line equation to Find 𝑦𝑖 • This is just an example of Linear Regression
  • 26. 3. Datamining Techniques Data Analysis Rule Base Rule Base techniques : to find hidden patterns in the data Example of rule base techniques: • Customer normally buy rice always buy vegetable • Young people want to more expensive phone than others • People always buy laptop before buying cell-phone Frequent Pattern Gradual Pattern Sequential Pattern
  • 28. 3. Datamining Techniques Data Visualization Techniques to present knowledge that you retrieved to user 0 2 4 6 8 10 12 14 Series 3 Series 2 Series 1 Series 1 Series 2 Series 3 Category 1 4.3 2.4 2 Category 2 2.5 4.4 2 Category 3 3.5 1.8 3 Category 4 4.5 2.8 5
  • 29. Thank you for your attention
  翻译: