Data pipelines
This module is about data pipelines, data preprocessing and techniques to encode non-numerical features.
Data preprocessing
Pandas is a great tool to make data analysis. With such a weapon, we are going to face some problems and try to have hints useful to apply machine learning algorithms in the proper way.
We are going to talk about data preprocessing the mixed blessing of all data scientists.
Data Encoding
Data encoding is one of the skills a data scientist is required to have. We learn how to encode non-numerical variables in a form a machine learning algorithm can digest: i.e. numbers.
Data Scaling
Having data on the same range scales is useful both form the analysis point of view and from the numerical point of view, as numerical performances are going to benefit from this.
We analyse an example in some detail to see how to manage data issues like differences in scales and to help machine learning models to learn better.