Introduction to Data Science Lectures

Here is the material for a course I will be giving in a Master of Data Science and AI

View project on GitHub

Data pipelines

This module is about data pipelines, data preprocessing and techniques to encode non-numerical features.

Data preprocessing

Pandas is a great tool to make data analysis. With such a weapon, we are going to face some problems and try to have hints useful to apply machine learning algorithms in the proper way.

We are going to talk about data preprocessing the mixed blessing of all data scientists.

Data Encoding

Data encoding is one of the skills a data scientist is required to have. We learn how to encode non-numerical variables in a form a machine learning algorithm can digest: i.e. numbers.

Data Scaling

Having data on the same range scales is useful both form the analysis point of view and from the numerical point of view, as numerical performances are going to benefit from this.

We analyse an example in some detail to see how to manage data issues like differences in scales and to help machine learning models to learn better.