Spark for Data engineers

SQLBits encompasses everything from in-depth technical immersions to the enhancement of valuable soft skills. The full agenda will be announced in the spring; in the meantime check out the timetable and content we cover below.

Presenting 2024’s selection of training days, encompassing a deep dive into a range of subjects with some of the best data trainers in the world.

08:00 Registration opens and breakfast served.

All training days run simultaneously across the venue from 09:00 – 17:00 with co-ordinated breaks.

All training days include regular refreshment breaks and a lunch stop to rest, recharge, and chat to fellow delegates.

No evening events planned, but if you’re staying over the night beforehand, why not join us in the Aviator on Monday night to meet the training day speakers for an informal drinks reception.

Data Integrator

Spark for Data engineers

Description

Data Analysts, data Scientist, Business Intelligence analysts and many other roles require data on demand. Fighting with data silos, many scatter databases, Excel files, CSV files, JSON files, APIs and potentially different flavours of cloud storage may be tedious, nerve-wracking and time-consuming.

Automated process that would follow set of steps, procedures and processes take subsets of data, columns from database, binary files and merged them together to serve business needs and potentials is and still will be a favorite job for many organizations and teams.

Apache Spark™ is designed to to build faster and more reliable data pipelines, cover low level and structured API and brings tools and packages for Streaming data, Machine Learning, data engineering and building pipelines and extending the Spark ecosystem.

Spark is an absolute winner for this tasks and a great choice for adoption.

Data Engineering should have the extent and capability to do:

- System architecture
- Programming
- Database design and configuration
- Interface and sensor configuration

And in addition to that, it is as important as familiarity with the technical tools is, the concepts of data architecture and pipeline design are even more important. The tools are worthless without a solid conceptual understanding of:

- Data models
- Relational and non-relational database design
- Information flow
- Query execution and optimisation
- Comparative analysis of data stores
- Logical operations

Apache Spark have all the technology built-in to cover these topics and has the capacity for achieving a concrete goal for assembling together functional systems to do the goal.

Workshop Title: "Spark for Data Engineers"

Target Audience: Data engineer, BI Engineer, Cloud data engineer

Broader Audiance: Analysts, BI Analysts, Big Data analysts, DevOps data engineer, Machine Learning engineer, Statisticians, Data Scientist, Database Administrator, Data Orchestrator, Data Architect

Prerequisite knowledge for attendees:
Data engineering tasks:
- analyzing and organizing raw data (with T-SQL or Python or R or Scala)
- buidling data transformations and pipelines (with T-SQL or Python or R or Scala)

Technical prerequisite for attendees:
- working laptop with ability to install Apache Spark and other tools
- Access to internet
- Credentials and credit (free credit) for accessing Azure portal

Agenda for the day (9AM – 5PM; Start and end time can vary and will be finalised with organizator)

1. Module 1 (9.00AM – 10.00 AM): Getting to know Apache Spark, Installation and setting up the environment
2. Coffee Break 15'
3. Module 2 (10.00 – 11.15): Creating Datasets, organising raw data and working with structured APIs
4. Coffee Break 15'
5. Module 3 (11:30 – 13.00): Designing and building pipelines, moving data and building data models with Spark
6. Lunch: 13.00 – 14.00
7. Module 4: Data and process orchestration, deployment and Spark Applications (14.00 - 15.00)
8. Coffee break 15'
9. Module 5: Data Streaming with (15.15 - 16.15)
10. Module 6: Ecosystem, tooling and community (16.15 - 17.00)

All modules have hands-on material that will be given to attendees at the beginning of the training.

Feedback link: https://sqlb.it/?6188

Learning Objectives

Things I will need

Tech Covered

Azure, Spark, deployment, On Premise, Data Integrator, Managing Big Data, Managing, On Premises

Book now

The Agenda

2024 Training Days

Spark for Data engineers

Description

Learning Objectives

Things I will need

Tech Covered

Tomaž Kaštrun