Introduction to Data Science Lectures

Here is the material for a course I will be giving in a Master of Data Science and AI

View project on GitHub

Software tools

This lecture aims to introduce two powerful tools:

  • Tableau for data visual analysis.

  • Git for code versioning and GitHub for collaboration.

Tableau public

We are going to use the public version to create some dashboard. To download the software, click on the badge and you will be redirected to the download page.

Data and dashboards

A very complete and updated repository containing football data can be found here.

image

A nice dashboard, developed by the developer of one of the most commonly used models to calculate Expected Goals, Paul Riley, published in Tableau public

Git

image

Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency.

It is difficult to underestimate the importance of git. It is enough to state that in modern computer science every company uses git in some form to version code.

xkcd

GitHub

First of all, let us state an important fact: Git is not GitHub.

image

Indeed, Git is the tool, GitHub is one of the services for projects that use Git.

GitHub is a service provider of Internet hosting for software development and version control based on Git. A huge number of projects are hosted on GitHub repositories, included these lectures.