Introduction to Data Science Lectures

Here is the material for a course I will be giving in a Master of Data Science and AI

View project on GitHub

Unsupervised learning

Until now, we have explored supervised learning algorithms, i.e. we were in the presence of “right answers” labelling training data.

Now, we want to focus on unsupervised learning algorithms, and in particular on the class of clustering algorithms.

We do not have labels on data, and we look for a way to put together data and make the algorithm assign them to a same class.

Clustering

The main example of unsupervised learning algorithm is the class of clustering algorithms. The main goal of cluster analysis, or clustering, is the automatic discover of natural grouping in data.

In is an unsupervised learning task because, unlike supervised learning (like predictive modelling), clustering algorithms only interpret the input data and find natural groups or clusters in feature space.

There are several example of clustering techniques, based on distance or on point density in space. We are going to present two examples:

  1. $k$-means
  2. DBSCAN