Unsupervised Machine Learning With Python: Clustering. K-Means Clustering

Unsupervised Machine Learning With Python: Clustering. K-Means Clustering

The next few posts that we look at will explain a few of the many various clustering algorithms that are available for us to use in Python Programming Language. We will not be going into much detail for now as these are the first posts I am writing about each of these topics. As we gradually build our collection of posts, we will dive in-depth into each of these interesting algorithms. As of now, it is important that you understand the overall logic and process behind each of these algorithms.

Article content

The K-Means Clustering Algorithm

One of the popular strategies for clustering the data is K-means clustering. It is necessary to presume how many clusters there are. Flat clustering is another name for this. An iterative clustering approach is used. For this algorithm, the steps listed below must be followed.

Article content

PHASE 1: SELECT THE NUMBER OF CLUSTERS

The required number of K clusters must be specified.

PHASE 2: ASSIGN DATA POINTS TO, AND ADJUST CLUSTERS [ITERATIVE PHASE]

Each data point is randomly assigned to a cluster after determining the number of clusters. Or, to put it another way, we must group our data according to the number of clusters.

Cluster centroids should be calculated in this step.

Article content

Since this is an iterative procedure, we must change K centroids’ locations after each iteration until we locate the global optima, or, to put it another way, the centroids are in their ideal positions.

The K-means clustering technique can be implemented in Python with the aid of the following code. Utilizing the Scikit-learn module will be our approach, and this is one of the most popular machine learning frameworks in present times.

Clustering Example

We begin by importing the necessary packages into our script instance as follows:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.datasets import make_blobs
import seaborn as sns

sns.set()        

The make_blobs() function from the sklearn.datasets package is used to create the two-dimensional dataset with four blobs in the following line of code. It allows us to create dummy data points in the form of clusters.

X, y_true = make_blobs(n_samples = 500,
                       centers = 4,
                       cluster_std = 0.40,
                       random_state = 0)        

Finally, to obtain insight into the clusters that have been created for us, we may proceed to visualize the model using MatPlotLib package.

plt.title("Scatter Plot Showing K-Means Cluster Groups")
plt.xlabel("X-AXIS")
plt.ylabel("Y-AXIS")
plt.scatter(X[:, 0], X[:, 1], s = 50)
plt.show()        

The output for the above visualization code shows as follows:

Article content

Now that we have a dataset, we may proceed to train a K-Means clustering model on our dummy data. We will instantiate an object of the KMeans class as follows:

algorithm = KMeans(n_clusters=4)        

Next, we may train our model by utilizing the .fit() method on the KMeans object. We pass the input data as a parameter to the algorithm to train a model:

model = algorithm.fit(X)        

Next, we may obtain the predicted cluster to which each record (observation/row) supposedly belongs:

cluster_predictions = model.predict(X)        

We are able to obtain the center (a data point {x, y} for the center) of each distinct cluster group:

centers = model.cluster_centers_        

Finally, we may visualize the KMeans model using MatPlotLib as follows:

plt.title("Scatter Plot Showing K-Means Cluster Groups")
plt.xlabel("X-AXIS")
plt.ylabel("Y-AXIS")
plt.scatter(X[:, 0], X[:, 1], c = cluster_predictions, s = 50, cmap = 'viridis')
plt.scatter(centers[:, 0], centers[:, 1], c = 'black', s = 200, alpha = 0.5);
plt.show()        
Article content


Thrilled to dive into K-Means Clustering with Python - reminds me of Socrates' journey of endless learning 😊 Embrace curiosity, as knowledge is infinite. #machinelearning

Exciting insight. Can't wait to unravel Python's clustering superpowers with K-Means!🌟 Shivek Maharaj

To view or add a comment, sign in

More articles by Shivek Maharaj

Insights from the community

Others also viewed

Explore topics