The Agenda

Agenda Home
SQLBits 2024 runs from Tuesday 19th – Saturday 23rd March.

Machine Learning - how to be a superstar

Description

Machine Learning (ML) is a process, the steps are:

  • Select the data
  • Understand it
  • Clean it
  • Repeat
  • Select an ML algorithm
  • Prepare the data
  • Repeat (with model)
  • Train
  • Test
  • Evaluate
  • Until best model for this algorithm found
  • Until best overall model found
  • Combine models into an ensemble
  • Deploy model

Some sound really exciting (training the model) and some painfully boring (preparing data). But all are equally important; neglect any and you are wasting your time. We will show you WHY all these steps are necessary and how to complete them. This will not be suitable for people who are already doing ML well but (we hope) it will be incredibly useful for people who are about to start using ML and need to understand the entire process from start to finish.

Pre-requisites
Some knowledge of R or Python would be useful. There will be some code available and a brief time will be set aside at the end of the workshop to run this. However the day is mostly about how and why machine learning works so it will be perfectly possible to attend, understand and (hopefully) enjoy the day without a background knowledge of either language and without running any code. On the other hand, the data and code will be available so if you are familiar with these languages so you can try ML for yourself.
Laptop Required:Optional

  • Software: Bear in mind we have designed the workshop so that coding is not essential. However if you wish to use R then you should download it together with the IDE called RStudio. In addition, you should download and install the libraries rpart, readr, caTools and neuralnet. The internet has all sorts of descriptions/video/etc. of how to perform these downloads. If you would like to use Python, we recommend you install Spyder with the Anaconda distribution. Or, whichever Python IDE you already use. The required packages we will be working with are pandas, scikit-learn (both included with Anaconda) and graphviz (available through pip or conda install methods).
  • Subscriptions: No.

Learning Objectives

Previous Experience

Tech Covered

Machine Learning, ML, R, Python