Common Data Science Mistakes

2020

TL; DR

Taken from the 20+ years of field experiences, many common statistical and data science mistakes have been detected. Session will tackle couple of them.

Session Details

In the middle of deploying the model, team of data scientists realize that the predictions are "somewhat-off". Troubleshooting on the horizon and what to do. Session will guide you through most common mistakes data scientists and statisticians are making when preparing and engineering the data using T-SQL or any other database system.

Further more, we will explore common statistical and data science mistakes when modeling data, extracting know-how from the data, finding the hidden patterns and running different test against the structural models using mainly R, Python, or Spark. What not-to-do will be replaced with what to-do explanations using sample datasets and sample codes.

Some statistical knowledge or background is a plus!

3 things you'll get out of this session

Tomaž Kaštrun's other proposed sessions for 2026

Building AI solutions using Azure AI Foundry - 2026

RegEx, JSON, Vector, API and other new T-SQL features for developers in SQL Server 2025 - 2026

Tomaž Kaštrun's previous sessions

Common Data Science Mistakes

Most common statistical and data science mistakes and traps we all try to avoid them!

Notebooks - standardized enterprise solution for better story telling.

Delivering clear business supported decision is key element for good storytelling. Notebook and formatted text and code can help your organisation tidy up reporting, daily operational tasks and also cleaner data management.