SlideShare a Scribd company logo
K-MEANS
CLUSTERING
INTRODUCTION-
What is clustering?
 Clustering is the classification of objects into
different groups, or more precisely, the
partitioning of a data set into subsets
(clusters), so that the data in each subset
(ideally) share some common trait - often
according to some defined distance measure.
K-MEANS CLUSTERING
 The k-means algorithm is an algorithm to cluster
n objects based on attributes into k partitions,
where k < n.
 An algorithm for partitioning (or clustering) N
data points into K disjoint subsets Sj (k clusters)
containing data points so as to minimize the
sum-of-squares criterion
where xi is a vector representing the the nth data
point and cj is the geometric centroid of the data
points in Sj (kth cluster)
K-MEANS CLUSTERING
 Simply speaking k-means clustering is an
algorithm to classify or to group the objects
based on attributes/features into K number of
group.
 K is positive integer number.
 The grouping is done by minimizing the sum
of squares of distances between data and the
corresponding cluster centroid.
K-MEANS CLUSTERING
How the K-Mean Clustering
algorithm works?
 Initialization: once the number of groups, k has
been chosen, k centroids are established in the
data space, for instance, choosing them
randomly.
 Assignment of objects to the centroids: each
object of the data is assigned to its nearest
centroid.
 Centroids update: The position of the centroid
of each group is updated taking as the new
centroid the average position of the objects
belonging to said group.
 Step 1: Begin with a decision on the value of k =
number of clusters.
 Step 2: Put any initial partition that classifies the
data into k clusters. You may assign the training
samples randomly, or systematically as the following:
1. Take the first k training sample as single-element clusters
2. Assign each of the remaining (N-k) training sample to the
cluster with the nearest centroid. After each assignment,
recompute the centroid of the clusters.
K-MEANS CLUSTERING
 Step 3: Take each sample in sequence and
compute its distance from the centroid of
each of the clusters. If a sample is not
currently in the cluster with the closest
centroid, switch this sample to that cluster
and update the centroid of the cluster
gaining the new sample and the cluster
losing the sample.
 Step 4 . Repeat step 3 until convergence is
achieved, that is until a pass through the
training sample causes no new assignments.
K-MEANS CLUSTERING
A Simple example showing the
implementation of k-means algorithm
(using K=2)
Step 1:
Initialization: Randomly we choose following two centroids
(k=2) for two clusters.
In this case the 2 centroid are: m1 = (1.0,1.0) and m2 = (5.0,7.0)
Step 2:
 Thus, we obtain two clusters
containing:
{1,2,3} and {4,5,6,7}.
 Their new centroids are:
Individual Centroid 1 Centroid 2
1 0 7.21
2 1.12 6.10
3 3.61 3.61
4 7.21 0
5 4.72 2.5
6 5.31 2.06
7 4.30 2.92
Distance from individual
points to the two centroids
Step 3:
 Now using these centroids
we compute the Euclidean
distance of each object, as
shown in table.
 Therefore, the new
clusters are:
{1,2} and {3,4,5,6,7}
 Next centroids are:
m1=(1.25,1.5) and m2 =
(3.9,5.1)
Individual Centroid 1 Centroid 2
1 1.57 5.38
2 0.47 4.28
3 2.04 1.78
4 5.64 1.84
5 3.15 0.73
6 3.78 0.54
7 2.74 1.08
Distance from individual
points to the two centroids
Step 4 :
The clusters obtained are:
{1,2} and {3,4,5,6,7}
 Therefore, there is no
change in the cluster.
 Thus, the algorithm comes
to a halt here and final
result consist of 2 clusters
{1,2} and {3,4,5,6,7}.
Individual Centroid 1 Centroid 2
1 0.56 5.02
2 0.56 3.92
3 3.05 1.42
4 6.66 2.20
5 4.16 0.41
6 4.78 0.61
7 3.75 0.72
PLOT
(with K=3)
Step 1 Step 2
PLOT
Elbow Method (choosing the
number of clusters)
Another Method - silhouette coefficient (self study)
Elbow Method
Weaknesses of K-Mean Clustering
1. When the numbers of data are not so many, initial
grouping will determine the cluster significantly.
2. The number of cluster, K, must be determined before
hand. Its disadvantage is that it does not yield the same
result with each run, since the resulting clusters depend
on the initial random assignments.
3. We never know the real cluster, using the same data,
because if it is inputted in a different order it may
produce different cluster if the number of data is few.
4. It is sensitive to initial condition. Different initial condition
may produce different result of cluster. The algorithm
may be trapped in the local optimum.
Applications of K-Mean
Clustering
 It is relatively efficient and fast.
 k-means clustering can be applied to machine
learning or data mining
 Used on acoustic data in speech understanding to
convert waveforms into one of k categories or
Image Segmentation.
Visualization: Example
Visualization: Example
Visualization: Example
Visualization: Example
Visualization: Example
Visualization: Example
Visualization: Example
Visualization: Example
Visualization: Example
Visualization: Example
Visualization: Example
Visualization: Example
Visualization: Example
Visualization: Example
Visualization: Example
Application: Segmentation
Lecture_3_k-mean-clustering.ppt
Segmentation
Ad

More Related Content

What's hot (20)

K means Clustering Algorithm
K means Clustering AlgorithmK means Clustering Algorithm
K means Clustering Algorithm
Kasun Ranga Wijeweera
 
Cluster Analysis Introduction
Cluster Analysis IntroductionCluster Analysis Introduction
Cluster Analysis Introduction
PrasiddhaSarma
 
Greedy algorithm
Greedy algorithmGreedy algorithm
Greedy algorithm
CHANDAN KUMAR
 
Pre-Cal 40S Trigonometric Identities Math Dictionary
Pre-Cal 40S Trigonometric Identities Math DictionaryPre-Cal 40S Trigonometric Identities Math Dictionary
Pre-Cal 40S Trigonometric Identities Math Dictionary
Darren Kuropatwa
 
Knn Algorithm presentation
Knn Algorithm presentationKnn Algorithm presentation
Knn Algorithm presentation
RishavSharma112
 
Travelling salesman dynamic programming
Travelling salesman dynamic programmingTravelling salesman dynamic programming
Travelling salesman dynamic programming
maharajdey
 
Decision tree
Decision treeDecision tree
Decision tree
Ami_Surati
 
Logistic regression in Machine Learning
Logistic regression in Machine LearningLogistic regression in Machine Learning
Logistic regression in Machine Learning
Kuppusamy P
 
Gradient descent optimizer
Gradient descent optimizerGradient descent optimizer
Gradient descent optimizer
Hojin Yang
 
Asymptotic analysis
Asymptotic analysisAsymptotic analysis
Asymptotic analysis
Soujanya V
 
How to use LaTeX and Beamer to prepare presentation for Slideshare
How to use LaTeX and Beamer to prepare presentation for SlideshareHow to use LaTeX and Beamer to prepare presentation for Slideshare
How to use LaTeX and Beamer to prepare presentation for Slideshare
Vesa Linja-aho
 
Random Forest
Random ForestRandom Forest
Random Forest
Abdullah al Mamun
 
Divide and conquer - Quick sort
Divide and conquer - Quick sortDivide and conquer - Quick sort
Divide and conquer - Quick sort
Madhu Bala
 
K means Clustering
K means ClusteringK means Clustering
K means Clustering
Edureka!
 
Prims and kruskal algorithms
Prims and kruskal algorithmsPrims and kruskal algorithms
Prims and kruskal algorithms
Saga Valsalan
 
Unit vi
Unit viUnit vi
Unit vi
mrecedu
 
12 pattern recognition
12 pattern recognition12 pattern recognition
12 pattern recognition
Talal Khaliq
 
K-Nearest Neighbor Classifier
K-Nearest Neighbor ClassifierK-Nearest Neighbor Classifier
K-Nearest Neighbor Classifier
Neha Kulkarni
 
Machine learning clustering
Machine learning clusteringMachine learning clustering
Machine learning clustering
CosmoAIMS Bassett
 
Gradient descent method
Gradient descent methodGradient descent method
Gradient descent method
Sanghyuk Chun
 
Cluster Analysis Introduction
Cluster Analysis IntroductionCluster Analysis Introduction
Cluster Analysis Introduction
PrasiddhaSarma
 
Pre-Cal 40S Trigonometric Identities Math Dictionary
Pre-Cal 40S Trigonometric Identities Math DictionaryPre-Cal 40S Trigonometric Identities Math Dictionary
Pre-Cal 40S Trigonometric Identities Math Dictionary
Darren Kuropatwa
 
Knn Algorithm presentation
Knn Algorithm presentationKnn Algorithm presentation
Knn Algorithm presentation
RishavSharma112
 
Travelling salesman dynamic programming
Travelling salesman dynamic programmingTravelling salesman dynamic programming
Travelling salesman dynamic programming
maharajdey
 
Logistic regression in Machine Learning
Logistic regression in Machine LearningLogistic regression in Machine Learning
Logistic regression in Machine Learning
Kuppusamy P
 
Gradient descent optimizer
Gradient descent optimizerGradient descent optimizer
Gradient descent optimizer
Hojin Yang
 
Asymptotic analysis
Asymptotic analysisAsymptotic analysis
Asymptotic analysis
Soujanya V
 
How to use LaTeX and Beamer to prepare presentation for Slideshare
How to use LaTeX and Beamer to prepare presentation for SlideshareHow to use LaTeX and Beamer to prepare presentation for Slideshare
How to use LaTeX and Beamer to prepare presentation for Slideshare
Vesa Linja-aho
 
Divide and conquer - Quick sort
Divide and conquer - Quick sortDivide and conquer - Quick sort
Divide and conquer - Quick sort
Madhu Bala
 
K means Clustering
K means ClusteringK means Clustering
K means Clustering
Edureka!
 
Prims and kruskal algorithms
Prims and kruskal algorithmsPrims and kruskal algorithms
Prims and kruskal algorithms
Saga Valsalan
 
12 pattern recognition
12 pattern recognition12 pattern recognition
12 pattern recognition
Talal Khaliq
 
K-Nearest Neighbor Classifier
K-Nearest Neighbor ClassifierK-Nearest Neighbor Classifier
K-Nearest Neighbor Classifier
Neha Kulkarni
 
Gradient descent method
Gradient descent methodGradient descent method
Gradient descent method
Sanghyuk Chun
 

Similar to Lecture_3_k-mean-clustering.ppt (20)

k-mean-clustering big data analaysis.ppt
k-mean-clustering big data analaysis.pptk-mean-clustering big data analaysis.ppt
k-mean-clustering big data analaysis.ppt
abikishor767
 
K mean clustering algorithm unsupervised learning
K mean clustering algorithm unsupervised learningK mean clustering algorithm unsupervised learning
K mean clustering algorithm unsupervised learning
namansingh302004
 
k-mean-clustering for data classification
k-mean-clustering for data classificationk-mean-clustering for data classification
k-mean-clustering for data classification
KantilalRane1
 
06K-means-clustering K-MEANS CLUSTERING.ppt
06K-means-clustering K-MEANS CLUSTERING.ppt06K-means-clustering K-MEANS CLUSTERING.ppt
06K-means-clustering K-MEANS CLUSTERING.ppt
ALiHassan443161
 
06K-means-clusteringK-MEANS CLUSTERINGK-MEANS CLUSTERING.ppt
06K-means-clusteringK-MEANS CLUSTERINGK-MEANS CLUSTERING.ppt06K-means-clusteringK-MEANS CLUSTERINGK-MEANS CLUSTERING.ppt
06K-means-clusteringK-MEANS CLUSTERINGK-MEANS CLUSTERING.ppt
ALiHassan443161
 
AI-Lec20 Clustering I - Kmean.pptx
AI-Lec20 Clustering I - Kmean.pptxAI-Lec20 Clustering I - Kmean.pptx
AI-Lec20 Clustering I - Kmean.pptx
Syed Ejaz
 
K means Clustering - algorithm to cluster n objects
K means Clustering - algorithm to cluster n objectsK means Clustering - algorithm to cluster n objects
K means Clustering - algorithm to cluster n objects
VoidVampire
 
K mean-clustering algorithm
K mean-clustering algorithmK mean-clustering algorithm
K mean-clustering algorithm
parry prabhu
 
K mean-clustering
K mean-clusteringK mean-clustering
K mean-clustering
PVP College
 
k-mean-clustering.ppt
k-mean-clustering.pptk-mean-clustering.ppt
k-mean-clustering.ppt
RanimeLoutar
 
k-mean-clustering algorithm with example.ppt
k-mean-clustering algorithm with example.pptk-mean-clustering algorithm with example.ppt
k-mean-clustering algorithm with example.ppt
geethar79
 
Unsupervised Machine Learning, Clustering, K-Means
Unsupervised Machine Learning, Clustering, K-MeansUnsupervised Machine Learning, Clustering, K-Means
Unsupervised Machine Learning, Clustering, K-Means
MomonLuffy
 
k-mean-Clustering impact on AI using DSS
k-mean-Clustering impact on AI using DSSk-mean-Clustering impact on AI using DSS
k-mean-Clustering impact on AI using DSS
MarkNaguibElAbd
 
k-mean-clustering (1) clustering topic explanation
k-mean-clustering (1) clustering topic explanationk-mean-clustering (1) clustering topic explanation
k-mean-clustering (1) clustering topic explanation
my123lapto
 
K means clustering
K means clusteringK means clustering
K means clustering
keshav goyal
 
k-mean-clustering.pdf
k-mean-clustering.pdfk-mean-clustering.pdf
k-mean-clustering.pdf
YatharthKhichar1
 
Clustering
ClusteringClustering
Clustering
Md. Hasnat Shoheb
 
Enhance The K Means Algorithm On Spatial Dataset
Enhance The K Means Algorithm On Spatial DatasetEnhance The K Means Algorithm On Spatial Dataset
Enhance The K Means Algorithm On Spatial Dataset
AlaaZ
 
k-mean medoid and-knn-algorithm problems.pptx
k-mean medoid and-knn-algorithm problems.pptxk-mean medoid and-knn-algorithm problems.pptx
k-mean medoid and-knn-algorithm problems.pptx
DulalChandraDas1
 
K means ALGORITHM IN MACHINE LEARNING.pptx
K means ALGORITHM IN MACHINE LEARNING.pptxK means ALGORITHM IN MACHINE LEARNING.pptx
K means ALGORITHM IN MACHINE LEARNING.pptx
angelinjeba6
 
k-mean-clustering big data analaysis.ppt
k-mean-clustering big data analaysis.pptk-mean-clustering big data analaysis.ppt
k-mean-clustering big data analaysis.ppt
abikishor767
 
K mean clustering algorithm unsupervised learning
K mean clustering algorithm unsupervised learningK mean clustering algorithm unsupervised learning
K mean clustering algorithm unsupervised learning
namansingh302004
 
k-mean-clustering for data classification
k-mean-clustering for data classificationk-mean-clustering for data classification
k-mean-clustering for data classification
KantilalRane1
 
06K-means-clustering K-MEANS CLUSTERING.ppt
06K-means-clustering K-MEANS CLUSTERING.ppt06K-means-clustering K-MEANS CLUSTERING.ppt
06K-means-clustering K-MEANS CLUSTERING.ppt
ALiHassan443161
 
06K-means-clusteringK-MEANS CLUSTERINGK-MEANS CLUSTERING.ppt
06K-means-clusteringK-MEANS CLUSTERINGK-MEANS CLUSTERING.ppt06K-means-clusteringK-MEANS CLUSTERINGK-MEANS CLUSTERING.ppt
06K-means-clusteringK-MEANS CLUSTERINGK-MEANS CLUSTERING.ppt
ALiHassan443161
 
AI-Lec20 Clustering I - Kmean.pptx
AI-Lec20 Clustering I - Kmean.pptxAI-Lec20 Clustering I - Kmean.pptx
AI-Lec20 Clustering I - Kmean.pptx
Syed Ejaz
 
K means Clustering - algorithm to cluster n objects
K means Clustering - algorithm to cluster n objectsK means Clustering - algorithm to cluster n objects
K means Clustering - algorithm to cluster n objects
VoidVampire
 
K mean-clustering algorithm
K mean-clustering algorithmK mean-clustering algorithm
K mean-clustering algorithm
parry prabhu
 
K mean-clustering
K mean-clusteringK mean-clustering
K mean-clustering
PVP College
 
k-mean-clustering.ppt
k-mean-clustering.pptk-mean-clustering.ppt
k-mean-clustering.ppt
RanimeLoutar
 
k-mean-clustering algorithm with example.ppt
k-mean-clustering algorithm with example.pptk-mean-clustering algorithm with example.ppt
k-mean-clustering algorithm with example.ppt
geethar79
 
Unsupervised Machine Learning, Clustering, K-Means
Unsupervised Machine Learning, Clustering, K-MeansUnsupervised Machine Learning, Clustering, K-Means
Unsupervised Machine Learning, Clustering, K-Means
MomonLuffy
 
k-mean-Clustering impact on AI using DSS
k-mean-Clustering impact on AI using DSSk-mean-Clustering impact on AI using DSS
k-mean-Clustering impact on AI using DSS
MarkNaguibElAbd
 
k-mean-clustering (1) clustering topic explanation
k-mean-clustering (1) clustering topic explanationk-mean-clustering (1) clustering topic explanation
k-mean-clustering (1) clustering topic explanation
my123lapto
 
K means clustering
K means clusteringK means clustering
K means clustering
keshav goyal
 
Enhance The K Means Algorithm On Spatial Dataset
Enhance The K Means Algorithm On Spatial DatasetEnhance The K Means Algorithm On Spatial Dataset
Enhance The K Means Algorithm On Spatial Dataset
AlaaZ
 
k-mean medoid and-knn-algorithm problems.pptx
k-mean medoid and-knn-algorithm problems.pptxk-mean medoid and-knn-algorithm problems.pptx
k-mean medoid and-knn-algorithm problems.pptx
DulalChandraDas1
 
K means ALGORITHM IN MACHINE LEARNING.pptx
K means ALGORITHM IN MACHINE LEARNING.pptxK means ALGORITHM IN MACHINE LEARNING.pptx
K means ALGORITHM IN MACHINE LEARNING.pptx
angelinjeba6
 
Ad

Recently uploaded (20)

UNIT 3 Software Engineering (BCS601) EIOV.pdf
UNIT 3 Software Engineering (BCS601) EIOV.pdfUNIT 3 Software Engineering (BCS601) EIOV.pdf
UNIT 3 Software Engineering (BCS601) EIOV.pdf
sikarwaramit089
 
Personal Protective Efsgfgsffquipment.ppt
Personal Protective Efsgfgsffquipment.pptPersonal Protective Efsgfgsffquipment.ppt
Personal Protective Efsgfgsffquipment.ppt
ganjangbegu579
 
Zeiss-Ultra-Optimeter metrology subject.pdf
Zeiss-Ultra-Optimeter metrology subject.pdfZeiss-Ultra-Optimeter metrology subject.pdf
Zeiss-Ultra-Optimeter metrology subject.pdf
Saikumar174642
 
Optimizing Reinforced Concrete Cantilever Retaining Walls Using Gases Brownia...
Optimizing Reinforced Concrete Cantilever Retaining Walls Using Gases Brownia...Optimizing Reinforced Concrete Cantilever Retaining Walls Using Gases Brownia...
Optimizing Reinforced Concrete Cantilever Retaining Walls Using Gases Brownia...
Journal of Soft Computing in Civil Engineering
 
Urban Transport Infrastructure September 2023
Urban Transport Infrastructure September 2023Urban Transport Infrastructure September 2023
Urban Transport Infrastructure September 2023
Rajesh Prasad
 
Frontend Architecture Diagram/Guide For Frontend Engineers
Frontend Architecture Diagram/Guide For Frontend EngineersFrontend Architecture Diagram/Guide For Frontend Engineers
Frontend Architecture Diagram/Guide For Frontend Engineers
Michael Hertzberg
 
Design Optimization of Reinforced Concrete Waffle Slab Using Genetic Algorithm
Design Optimization of Reinforced Concrete Waffle Slab Using Genetic AlgorithmDesign Optimization of Reinforced Concrete Waffle Slab Using Genetic Algorithm
Design Optimization of Reinforced Concrete Waffle Slab Using Genetic Algorithm
Journal of Soft Computing in Civil Engineering
 
AI Chatbots & Software Development Teams
AI Chatbots & Software Development TeamsAI Chatbots & Software Development Teams
AI Chatbots & Software Development Teams
Joe Krall
 
Smart City is the Future EN - 2024 Thailand Modify V1.0.pdf
Smart City is the Future EN - 2024 Thailand Modify V1.0.pdfSmart City is the Future EN - 2024 Thailand Modify V1.0.pdf
Smart City is the Future EN - 2024 Thailand Modify V1.0.pdf
PawachMetharattanara
 
Agents chapter of Artificial intelligence
Agents chapter of Artificial intelligenceAgents chapter of Artificial intelligence
Agents chapter of Artificial intelligence
DebdeepMukherjee9
 
VISHAL KUMAR SINGH Latest Resume with updated details
VISHAL KUMAR SINGH Latest Resume with updated detailsVISHAL KUMAR SINGH Latest Resume with updated details
VISHAL KUMAR SINGH Latest Resume with updated details
Vishal Kumar Singh
 
Introduction to Additive Manufacturing(3D printing)
Introduction to Additive Manufacturing(3D printing)Introduction to Additive Manufacturing(3D printing)
Introduction to Additive Manufacturing(3D printing)
vijimech408
 
Slide share PPT of NOx control technologies.pptx
Slide share PPT of  NOx control technologies.pptxSlide share PPT of  NOx control technologies.pptx
Slide share PPT of NOx control technologies.pptx
vvsasane
 
Transport modelling at SBB, presentation at EPFL in 2025
Transport modelling at SBB, presentation at EPFL in 2025Transport modelling at SBB, presentation at EPFL in 2025
Transport modelling at SBB, presentation at EPFL in 2025
Antonin Danalet
 
[PyCon US 2025] Scaling the Mountain_ A Framework for Tackling Large-Scale Te...
[PyCon US 2025] Scaling the Mountain_ A Framework for Tackling Large-Scale Te...[PyCon US 2025] Scaling the Mountain_ A Framework for Tackling Large-Scale Te...
[PyCon US 2025] Scaling the Mountain_ A Framework for Tackling Large-Scale Te...
Jimmy Lai
 
AI-Powered Data Management and Governance in Retail
AI-Powered Data Management and Governance in RetailAI-Powered Data Management and Governance in Retail
AI-Powered Data Management and Governance in Retail
IJDKP
 
vtc2018fall_otfs_tutorial_presentation_1.pdf
vtc2018fall_otfs_tutorial_presentation_1.pdfvtc2018fall_otfs_tutorial_presentation_1.pdf
vtc2018fall_otfs_tutorial_presentation_1.pdf
RaghavaGD1
 
698642933-DdocfordownloadEEP-FAKE-PPT.pptx
698642933-DdocfordownloadEEP-FAKE-PPT.pptx698642933-DdocfordownloadEEP-FAKE-PPT.pptx
698642933-DdocfordownloadEEP-FAKE-PPT.pptx
speedcomcyber25
 
Little Known Ways To 3 Best sites to Buy Linkedin Accounts.pdf
Little Known Ways To 3 Best sites to Buy Linkedin Accounts.pdfLittle Known Ways To 3 Best sites to Buy Linkedin Accounts.pdf
Little Known Ways To 3 Best sites to Buy Linkedin Accounts.pdf
gori42199
 
Environment .................................
Environment .................................Environment .................................
Environment .................................
shadyozq9
 
UNIT 3 Software Engineering (BCS601) EIOV.pdf
UNIT 3 Software Engineering (BCS601) EIOV.pdfUNIT 3 Software Engineering (BCS601) EIOV.pdf
UNIT 3 Software Engineering (BCS601) EIOV.pdf
sikarwaramit089
 
Personal Protective Efsgfgsffquipment.ppt
Personal Protective Efsgfgsffquipment.pptPersonal Protective Efsgfgsffquipment.ppt
Personal Protective Efsgfgsffquipment.ppt
ganjangbegu579
 
Zeiss-Ultra-Optimeter metrology subject.pdf
Zeiss-Ultra-Optimeter metrology subject.pdfZeiss-Ultra-Optimeter metrology subject.pdf
Zeiss-Ultra-Optimeter metrology subject.pdf
Saikumar174642
 
Urban Transport Infrastructure September 2023
Urban Transport Infrastructure September 2023Urban Transport Infrastructure September 2023
Urban Transport Infrastructure September 2023
Rajesh Prasad
 
Frontend Architecture Diagram/Guide For Frontend Engineers
Frontend Architecture Diagram/Guide For Frontend EngineersFrontend Architecture Diagram/Guide For Frontend Engineers
Frontend Architecture Diagram/Guide For Frontend Engineers
Michael Hertzberg
 
AI Chatbots & Software Development Teams
AI Chatbots & Software Development TeamsAI Chatbots & Software Development Teams
AI Chatbots & Software Development Teams
Joe Krall
 
Smart City is the Future EN - 2024 Thailand Modify V1.0.pdf
Smart City is the Future EN - 2024 Thailand Modify V1.0.pdfSmart City is the Future EN - 2024 Thailand Modify V1.0.pdf
Smart City is the Future EN - 2024 Thailand Modify V1.0.pdf
PawachMetharattanara
 
Agents chapter of Artificial intelligence
Agents chapter of Artificial intelligenceAgents chapter of Artificial intelligence
Agents chapter of Artificial intelligence
DebdeepMukherjee9
 
VISHAL KUMAR SINGH Latest Resume with updated details
VISHAL KUMAR SINGH Latest Resume with updated detailsVISHAL KUMAR SINGH Latest Resume with updated details
VISHAL KUMAR SINGH Latest Resume with updated details
Vishal Kumar Singh
 
Introduction to Additive Manufacturing(3D printing)
Introduction to Additive Manufacturing(3D printing)Introduction to Additive Manufacturing(3D printing)
Introduction to Additive Manufacturing(3D printing)
vijimech408
 
Slide share PPT of NOx control technologies.pptx
Slide share PPT of  NOx control technologies.pptxSlide share PPT of  NOx control technologies.pptx
Slide share PPT of NOx control technologies.pptx
vvsasane
 
Transport modelling at SBB, presentation at EPFL in 2025
Transport modelling at SBB, presentation at EPFL in 2025Transport modelling at SBB, presentation at EPFL in 2025
Transport modelling at SBB, presentation at EPFL in 2025
Antonin Danalet
 
[PyCon US 2025] Scaling the Mountain_ A Framework for Tackling Large-Scale Te...
[PyCon US 2025] Scaling the Mountain_ A Framework for Tackling Large-Scale Te...[PyCon US 2025] Scaling the Mountain_ A Framework for Tackling Large-Scale Te...
[PyCon US 2025] Scaling the Mountain_ A Framework for Tackling Large-Scale Te...
Jimmy Lai
 
AI-Powered Data Management and Governance in Retail
AI-Powered Data Management and Governance in RetailAI-Powered Data Management and Governance in Retail
AI-Powered Data Management and Governance in Retail
IJDKP
 
vtc2018fall_otfs_tutorial_presentation_1.pdf
vtc2018fall_otfs_tutorial_presentation_1.pdfvtc2018fall_otfs_tutorial_presentation_1.pdf
vtc2018fall_otfs_tutorial_presentation_1.pdf
RaghavaGD1
 
698642933-DdocfordownloadEEP-FAKE-PPT.pptx
698642933-DdocfordownloadEEP-FAKE-PPT.pptx698642933-DdocfordownloadEEP-FAKE-PPT.pptx
698642933-DdocfordownloadEEP-FAKE-PPT.pptx
speedcomcyber25
 
Little Known Ways To 3 Best sites to Buy Linkedin Accounts.pdf
Little Known Ways To 3 Best sites to Buy Linkedin Accounts.pdfLittle Known Ways To 3 Best sites to Buy Linkedin Accounts.pdf
Little Known Ways To 3 Best sites to Buy Linkedin Accounts.pdf
gori42199
 
Environment .................................
Environment .................................Environment .................................
Environment .................................
shadyozq9
 
Ad

Lecture_3_k-mean-clustering.ppt

  • 2. INTRODUCTION- What is clustering?  Clustering is the classification of objects into different groups, or more precisely, the partitioning of a data set into subsets (clusters), so that the data in each subset (ideally) share some common trait - often according to some defined distance measure.
  • 3. K-MEANS CLUSTERING  The k-means algorithm is an algorithm to cluster n objects based on attributes into k partitions, where k < n.
  • 4.  An algorithm for partitioning (or clustering) N data points into K disjoint subsets Sj (k clusters) containing data points so as to minimize the sum-of-squares criterion where xi is a vector representing the the nth data point and cj is the geometric centroid of the data points in Sj (kth cluster) K-MEANS CLUSTERING
  • 5.  Simply speaking k-means clustering is an algorithm to classify or to group the objects based on attributes/features into K number of group.  K is positive integer number.  The grouping is done by minimizing the sum of squares of distances between data and the corresponding cluster centroid. K-MEANS CLUSTERING
  • 6. How the K-Mean Clustering algorithm works?  Initialization: once the number of groups, k has been chosen, k centroids are established in the data space, for instance, choosing them randomly.  Assignment of objects to the centroids: each object of the data is assigned to its nearest centroid.  Centroids update: The position of the centroid of each group is updated taking as the new centroid the average position of the objects belonging to said group.
  • 7.  Step 1: Begin with a decision on the value of k = number of clusters.  Step 2: Put any initial partition that classifies the data into k clusters. You may assign the training samples randomly, or systematically as the following: 1. Take the first k training sample as single-element clusters 2. Assign each of the remaining (N-k) training sample to the cluster with the nearest centroid. After each assignment, recompute the centroid of the clusters. K-MEANS CLUSTERING
  • 8.  Step 3: Take each sample in sequence and compute its distance from the centroid of each of the clusters. If a sample is not currently in the cluster with the closest centroid, switch this sample to that cluster and update the centroid of the cluster gaining the new sample and the cluster losing the sample.  Step 4 . Repeat step 3 until convergence is achieved, that is until a pass through the training sample causes no new assignments. K-MEANS CLUSTERING
  • 9. A Simple example showing the implementation of k-means algorithm (using K=2)
  • 10. Step 1: Initialization: Randomly we choose following two centroids (k=2) for two clusters. In this case the 2 centroid are: m1 = (1.0,1.0) and m2 = (5.0,7.0)
  • 11. Step 2:  Thus, we obtain two clusters containing: {1,2,3} and {4,5,6,7}.  Their new centroids are: Individual Centroid 1 Centroid 2 1 0 7.21 2 1.12 6.10 3 3.61 3.61 4 7.21 0 5 4.72 2.5 6 5.31 2.06 7 4.30 2.92 Distance from individual points to the two centroids
  • 12. Step 3:  Now using these centroids we compute the Euclidean distance of each object, as shown in table.  Therefore, the new clusters are: {1,2} and {3,4,5,6,7}  Next centroids are: m1=(1.25,1.5) and m2 = (3.9,5.1) Individual Centroid 1 Centroid 2 1 1.57 5.38 2 0.47 4.28 3 2.04 1.78 4 5.64 1.84 5 3.15 0.73 6 3.78 0.54 7 2.74 1.08 Distance from individual points to the two centroids
  • 13. Step 4 : The clusters obtained are: {1,2} and {3,4,5,6,7}  Therefore, there is no change in the cluster.  Thus, the algorithm comes to a halt here and final result consist of 2 clusters {1,2} and {3,4,5,6,7}. Individual Centroid 1 Centroid 2 1 0.56 5.02 2 0.56 3.92 3 3.05 1.42 4 6.66 2.20 5 4.16 0.41 6 4.78 0.61 7 3.75 0.72
  • 14. PLOT
  • 16. PLOT
  • 17. Elbow Method (choosing the number of clusters) Another Method - silhouette coefficient (self study) Elbow Method
  • 18. Weaknesses of K-Mean Clustering 1. When the numbers of data are not so many, initial grouping will determine the cluster significantly. 2. The number of cluster, K, must be determined before hand. Its disadvantage is that it does not yield the same result with each run, since the resulting clusters depend on the initial random assignments. 3. We never know the real cluster, using the same data, because if it is inputted in a different order it may produce different cluster if the number of data is few. 4. It is sensitive to initial condition. Different initial condition may produce different result of cluster. The algorithm may be trapped in the local optimum.
  • 19. Applications of K-Mean Clustering  It is relatively efficient and fast.  k-means clustering can be applied to machine learning or data mining  Used on acoustic data in speech understanding to convert waveforms into one of k categories or Image Segmentation.
  翻译: