Unsupervised Machine Learning With Python: Clustering. K-Means Clustering
The next few posts that we look at will explain a few of the many various clustering algorithms that are available for us to use in Python Programming Language. We will not be going into much detail for now as these are the first posts I am writing about each of these topics. As we gradually build our collection of posts, we will dive in-depth into each of these interesting algorithms. As of now, it is important that you understand the overall logic and process behind each of these algorithms.
The K-Means Clustering Algorithm
One of the popular strategies for clustering the data is K-means clustering. It is necessary to presume how many clusters there are. Flat clustering is another name for this. An iterative clustering approach is used. For this algorithm, the steps listed below must be followed.
PHASE 1: SELECT THE NUMBER OF CLUSTERS
The required number of K clusters must be specified.
PHASE 2: ASSIGN DATA POINTS TO, AND ADJUST CLUSTERS [ITERATIVE PHASE]
Each data point is randomly assigned to a cluster after determining the number of clusters. Or, to put it another way, we must group our data according to the number of clusters.
Cluster centroids should be calculated in this step.
Since this is an iterative procedure, we must change K centroids’ locations after each iteration until we locate the global optima, or, to put it another way, the centroids are in their ideal positions.
The K-means clustering technique can be implemented in Python with the aid of the following code. Utilizing the Scikit-learn module will be our approach, and this is one of the most popular machine learning frameworks in present times.
Clustering Example
We begin by importing the necessary packages into our script instance as follows:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.datasets import make_blobs
import seaborn as sns
sns.set()
The make_blobs() function from the sklearn.datasets package is used to create the two-dimensional dataset with four blobs in the following line of code. It allows us to create dummy data points in the form of clusters.
Recommended by LinkedIn
X, y_true = make_blobs(n_samples = 500,
centers = 4,
cluster_std = 0.40,
random_state = 0)
Finally, to obtain insight into the clusters that have been created for us, we may proceed to visualize the model using MatPlotLib package.
plt.title("Scatter Plot Showing K-Means Cluster Groups")
plt.xlabel("X-AXIS")
plt.ylabel("Y-AXIS")
plt.scatter(X[:, 0], X[:, 1], s = 50)
plt.show()
The output for the above visualization code shows as follows:
Now that we have a dataset, we may proceed to train a K-Means clustering model on our dummy data. We will instantiate an object of the KMeans class as follows:
algorithm = KMeans(n_clusters=4)
Next, we may train our model by utilizing the .fit() method on the KMeans object. We pass the input data as a parameter to the algorithm to train a model:
model = algorithm.fit(X)
Next, we may obtain the predicted cluster to which each record (observation/row) supposedly belongs:
cluster_predictions = model.predict(X)
We are able to obtain the center (a data point {x, y} for the center) of each distinct cluster group:
centers = model.cluster_centers_
Finally, we may visualize the KMeans model using MatPlotLib as follows:
plt.title("Scatter Plot Showing K-Means Cluster Groups")
plt.xlabel("X-AXIS")
plt.ylabel("Y-AXIS")
plt.scatter(X[:, 0], X[:, 1], c = cluster_predictions, s = 50, cmap = 'viridis')
plt.scatter(centers[:, 0], centers[:, 1], c = 'black', s = 200, alpha = 0.5);
plt.show()
Thrilled to dive into K-Means Clustering with Python - reminds me of Socrates' journey of endless learning 😊 Embrace curiosity, as knowledge is infinite. #machinelearning
Exciting insight. Can't wait to unravel Python's clustering superpowers with K-Means!🌟 Shivek Maharaj