Introduction to Data Science Lectures

Here is the material for a course I will be giving in a Master of Data Science and AI

View project on GitHub

Supervised Learning

In this lecture we focus on supervised learning problems.

We will explore the two main classes of such models: regression and classification tasks.

Regression problems

These are commonly referred to the prediction of continuous values. We will frame the problem as a supervised learning task building training couples and training a model on them.

Classification problems

These are the classical “assign objects to sets” problems. We are going to see how to predict probabilities of belonging to a certain class and defining a decision rule for the assignment.

Logistic Regression 🎚️

We present the simplest and most popular classification algorithm: Logistic Regression.

This is based on the sigmoid function, defined by

\[f(z) := \frac{1}{1+e^{-z}}\, .\]

We use such a function as a regressor for the probability to belong to a specified class.

Decision Tree 🌳

A decision tree is a flowchart-like structure in which each internal node represents a “test” on an attribute (e.g. whether a coin flip comes up heads or tails), each branch represents the outcome of the test, and each leaf node represents a class label (decision taken after computing all attributes). The paths from root to leaf represent classification rules.

Error metrics

We need to define a metric to measure how good is our model response. We examine the most suitable metrics for different tasks and introduce the concepts of satisfying and optimising metrics.

Aside note

To mix you up, we suggest to read this evergreen medium post titled “There is no classification” by the Head of Decision intelligence department at Google. Really inspiring and interesting in any case.