Skip to main content

Unsupervised Machine Learning — A Friendly, Step-by-Step Tutorial


Think of going into a library with only a single book there, and you are expected to arrange the books. You would most likely begin sorting books by appearance or feel, similar cover art or length or topic, although nobody informed you of the genres. It is the key idea of unsupervised learning: it seeks structure of data in case of no annotated responses.

This tutorial explains what is meant by unsupervised learning and why this type of learning is alternative to supervised learning and provides a summary of the most common types of algorithms: Clustering and dimensionality reduction. It also introduces the mathematical intuition behind the underlying basic mathematics of such applications and provides line by line explanation of the K-Means and PCA algorithms in Python.


What is Unsupervised Learning — and why it matters?

Unsupervised learning Unsupervised learning refers to a group of methods used to determine patterns, clusters or order in data without any target labels. This input output is not learnt by the model, rather internal structure of data is learnt.

Why it matters:

  • Most real world data is not labelled. Labeling is quite expensive and time consuming.
  • Exploration & discovery. It helps in finding groups, anomalies and structure before launching costly label based projects.
  • The dimensionality reduction is one of its subfields that make it easier to visualite the complex data and process it.
  • Useful in preprocessing, features, anomaly detection, recommendations among others.


Key differences from supervised learning (simple examples)

Supervised learning

  • It is comprised of labeled examples: (input-correct output).

  • Example: Training of spam in Email based on a large number of emails that are labelled spam or not spam.

  • Goal: ultimate objective of this is to forecast label of unseen new data.

Unsupervised learning

  • None of the labels, models develop structure spontaneously.

  • Example: Partition of the customer purchase records into segments (there are no segments labels present).

  • Goal: Purpose: Find patterns behind the data or effectively compress and characterize the data.

Short table:


Main types of Unsupervised Learning

Here’s a smoother rephrasing of your text:

A. Clustering — grouping similar data points

  • K-Means: Separates the data into k cluster based on centres.

  • Hierarchical Clustering: This algorithm generates a tree (dendrogram) of how the clusters merge or break.

  • DBSCAN: Density based algorithm, which detects clusters of various shapes and noise/outliers.

B. Dimensionality Reduction — simplifying complexity of high-dimensional data

  • PCA (Principal Component Analysis): This is a linear analysis used to identify directions (principal components) having the maximum possible variance.

  • t-SNE: It is a nonlinear algorithm and mainly a visualization algorithm which preserves local structure.

  • UMAP: This is yet another modern visualisation-effective algorithm, UMAP, which is analogous to t-SNE.


Intuition + simple math (beginner-friendly)

K-Means (intuitive)

  • Pick k (number of clusters).

  • Randomly place k centroids.

  • Repeat:

    1. Each point is allocated to the closest centroid.

    2. Move each centroid to the mean of its assigned points.

  • Terminate when the assignments are not changing.

Objective (what K-Means minimizes): within-cluster squared distances:

This formula represents the total within-cluster variance — the quantity K-Means tries to minimize. Here’s what each term means:

  • j: Total clustering cost (what K-Means minimizes).

  • kk: Number of clusters.

  • CiC_i: Set of points in cluster ii

  • xx: A data point.

  • μi\mu_i: Centroid (mean) of cluster ii.

  • xμi2\|x - \mu_i\|^2: Squared distance between point and centroid.


PCA (intuitive)

  • Find a new coordinate system where the first axis (PC1) captures the most variance, PC2 the next most (and is orthogonal to PC1), and so on.

  • You can project high-dimensional data onto the first few principal components for visualization or to reduce noise.


This concept tells you how much of the total variance in the data is captured by the ithi^{th} principal component. It’s commonly used in PCA (Principal Component Analysis) to understand the importance of each component.


Explanation of results (how to interpret)

  • Clusters: Points with the same color belong to the same cluster assigned by KMeans.

  • Centroids: The red Xs are center points — represent the “average” member of that cluster.

  • Silhouette score: Gives a numeric sense of clustering quality. For Iris, you usually get a moderately good score since species are somewhat separable.


Short notes on Hierarchical and DBSCAN (intuition)

Hierarchical clustering

  • Build a tree of clusters (dendrogram).

  • Good for small datasets and when you want multi-scale cluster views.

  • You can “cut” the tree to get a chosen number of clusters.

DBSCAN

  • Parameters: eps (radius), min_samples.

  • Dense regions (core points) form clusters; low-density points are labeled noise.

  • Great for clusters with weird shapes and automatic outlier detection.

  • Not good with widely varying densities or very high dimensions.


t-SNE (very short overview)

  • t-SNE is a nonlinear projection for visualization (keeps local neighbourhoods intact).

  • Good for visualizing clusters on high-dimensional data, but:

    • It’s stochastic (use random_state).

    • It doesn’t preserve global distances well.

    • Use it only for visualization (not as a general dimensionality reduction for modelling).


Common challenges & how to overcome them

  1. Choosing the right number of clusters (k)

    • Use elbow method, silhouette score, or domain knowledge.

  2. Feature scaling

    • Always scale numeric features before KMeans and PCA.

  3. Outliers influence KMeans

    • Use robust methods (DBSCAN) or remove/clip outliers beforehand.

  4. Cluster evaluation

    • No ground truth: use silhouette, Davies-Bouldin, or compare to business metrics.

  5. High dimensionality

    • Use PCA/UMAP to reduce dimensionality before clustering.

  6. Interpretability

    • Summarize clusters with representative examples or feature means.

  7. Different data types

    • For categorical features, use appropriate encodings or distance measures (K-Prototypes, Gower distance).

  8. Local optima / initialization

    • For KMeans, use multiple n_init runs and good init (like 'k-means++').


Real-world applications (simple examples)

  • Marketing: Segment customers for targeted campaigns (group by purchase patterns).

  • E-commerce: Product clustering for recommendations (group similar products).

  • Healthcare: Group patients by symptoms or gene expression to find subtypes.

  • Finance: Detect anomalous transactions (fraud).

  • Cybersecurity: Identify unusual login patterns or scans as anomalies.

  • Manufacturing: Monitor sensor streams and detect equipment anomalies.

  • NLP: Topic modeling and document clustering (group similar articles).

  • Astronomy: Group stars/galaxies by spectral properties.


Summary & takeaways

  • Unsupervised learning discovers structure in unlabeled data: clusters, low-dimensional structure, and anomalies.

  • Clustering (K-Means, Hierarchical, DBSCAN) organizes data into groups — pick method by data shape, size, and noise.

  • Dimensionality reduction (PCA, t-SNE) helps visualization and reduces noise; PCA is linear and interpretable, t-SNE is for visualization only.

  • Preprocessing matters: scale numeric data, handle categorical features appropriately.

  • Evaluation is harder than supervised learning — rely on silhouette, domain knowledge, and qualitative checks.

  • Start simple: Try PCA + K-Means, visualize clusters, then iterate with more advanced techniques (DBSCAN, UMAP, deep clustering).


Thanks for reading 💗!


If you found this post useful:

⮕  Share it with others who might benefit.
⮕  Leave a comment with your thoughts or questions—I’d love to hear from you.
⮕  Follow/Subscribe to the blog for more helpful guides, tips, and insights.

Comments

Popular posts from this blog

Artificial Intelligence vs Machine Learning vs Deep Learning: Key Differences Explained

  Everywhere you look today, people are talking about AI, Machine Learning, and Deep Learning . Tech companies use these terms in product launches, news headlines throw them around, and chances are you’ve already heard them in your classroom, workplace, or even casual conversations. But here’s the catch,  most people don’t actually know the difference . Some think AI, ML, and DL are the same thing. Others assume they’re just fancy names for robots or algorithms. So, what really sets them apart? Is AI just about robots? Is Machine Learning smarter than AI? And why does everyone say Deep Learning is the future? In this blog, we’ll break down introductory part of  AI, ML, and DL in simple and easy language with examples, diagrams, and real-life applications — so by the end, you’ll never be confused again. Think of Artificial Intelligence (AI), Machine Learning (ML), and Deep Learning (DL) like math sets . AI  is the  biggest set (Universal Set) ...

All About Inheritance in OOP: Explained with Real-Life Examples

  The first thing that comes to mind when  you hear the word  inheritance,  is passing something down from parents to children . In Object-Oriented Programming, it works in a very similar way—it allows one class (child class) to inherit properties and behaviors from another class (parent class). This concept makes code reusable, organized, and easier to maintain . Let’s explore inheritance step by step with real-time analogies to make it super simple. What is Inheritance in OOP? In OOP, inheritance is a concept in which a class inherits the properties (variables) and behaviors (methods) of other classes. Common features are defined in the parent class . These features are extended by the child class which is capable of adding its own features.. It is best to consider it as a family tree : a child takes after a parent in terms of eye color or height but can also possess an individual characteristic. Why Use Inheritance? Code Reusability – No need t...

Why Python Is the Best Programming Language for Machine Learning, AI, and Deep Learning

  Artificial Intelligence (AI), Machine Learning (ML), and Deep Learning (DL) have become some of the most transformative technologies of our time. From self-driving cars and recommendation systems to chatbots and healthcare diagnostics, these technologies are reshaping industries at an incredible pace. But have you ever wondered: Why do most AI researchers, data scientists, and developers prefer Python over other programming languages? In this blog, we’ll explore in depth why Python has emerged as the most popular and powerful language for AI, ML, and DL development . 1. Simplicity and Readability – Focus on Problem Solving, Not Syntax Complex mathematics and algorithms are some of the greatest challenges that newcomers in the world of AI/ML face. Python eases this load by providing a clean syntax that is easy to read. Let’s look at three concrete examples where Python’s clean syntax helps newcomers in AI/ML handle complex mathematics and algorithms more easily compared to ot...