K-Means Clustering with Python: A Practical Guide
Understanding K-Means Clustering
K-means clustering is a type of unsupervised learning algorithm that is used to divide a given dataset into n-clusters or subgroups. In this algorithm, we first have to choose a number of groups at random and then the algorithm tries to group similar data into those clusters. The algorithm iteratively optimizes the groups’ centroids until we get the most optimal clusters.
How K-Means Clustering Works
The approach of K-means clustering is similar to optimizing cost function in linear regression to find the best fit line. We start by selecting the number of clusters that we want to divide the data into. The keypoints in the algorithm are: Uncover fresh insights on the subject using this carefully chosen external resource to improve your reading experience. k means clustering python https://www.analyticsvidhya.com/blog/2019/08/comprehensive-guide-k-means-clustering/.
Advantages of K-Means Clustering:
K-means clustering is easy to interpret and implement, and it works well with large datasets since the algorithm converges faster.
Applications of K-Means Clustering:
K-Means clustering is being used in various applications such as:
K-Means Clustering with Python:
We will be using the Python scikit-learn library, which is a popular library for machine learning in Python. The following are the libraries that need to be imported for the implementation of K-means clustering:
“`python
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
“`
Example Implementation of K-Means Clustering:
Let us see how we can implement K-means clustering for a given dataset. We will be working with a sample dataset that contains the heights and weights of individuals, where the aim is to segment individuals into different groups based on their weight and height.
“`python
#import the dataset
dataset = pd.read_csv(‘sample_data.csv’)
x = dataset.iloc[:, [2, 3]].values
“`
The above code imports the dataset and extracts the required columns’ values. We will then determine the optimal number of clusters for our dataset using the elbow method, which plots the graph of the number of clusters against the WCSS (Within Cluster Sum of Squares)
The optimal number of clusters is the point where the curve starts to flatten out, giving a near linear silhouette score.
“`python
# using the elbow method
wcss = []
for i in range(1, 11):
kmeans = KMeans(n_clusters = i, init = ‘k-means++’, max_iter = 300, n_init = 10, random_state = 0)
kmeans.fit(x)
wcss.append(kmeans.inertia_)
plt.plot(range(1, 11), wcss)
plt.title(‘The Elbow Method’)
plt.xlabel(‘Number of clusters’)
plt.ylabel(‘WCSS’)
plt.show()
“`
The elbow method gives us an optimal value of 3 clusters. We can now perform the clustering using K-means clustering algorithm.
“`python
# Training the K-means model on a dataset
kmeans = KMeans(n_clusters = 3, init = ‘k-means++’, max_iter = 300, n_init = 10, random_state = 0)
y_kmeans = kmeans.fit_predict(x)
“`
Finally, we plot the clusters
“`python
# Visualizing the clusters in the dataset
plt.scatter(x[y_kmeans == 0, 0], x[y_kmeans == 0, 1], s = 100, c = ‘red’, label = ‘Cluster 1’)
plt.scatter(x[y_kmeans == 1, 0], x[y_kmeans == 1, 1], s = 100, c = ‘blue’, label = ‘Cluster 2’)
plt.scatter(x[y_kmeans == 2, 0], x[y_kmeans == 2, 1], s = 100, c = ‘green’, label = ‘Cluster 3’)
plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], s = 300, c = ‘yellow’, label = ‘Centroids’)
plt.title(‘Clusters of individuals’)
plt.xlabel(‘Height (in cms)’)
plt.ylabel(‘Weight (in pounds)’)
plt.legend()
plt.show()
“`
Conclusion
K-means clustering is an efficient algorithm that helps us to divide a given dataset into n-clusters. It is easy to implement and interpret, and it works well with large datasets. In this article, we learned how to implement K-means clustering, determine the optimal number of clusters using the elbow method, and visualize the clusters on a sample dataset using Python scikit-learn libraries. To discover more and complementary information about the subject discussed, we’re committed to providing an enriching educational experience. Www.Analyticsvidhya.com!
Wish to learn more about this topic? Access the related posts we’ve chosen to complement your reading experience: