Why Your SQL is Slow: Unmasking Hidden Performance Traps in Apache Pinot
Regular 50 minute session for SQLBits 2026TL; DR
This talk tackles a universal developer pain point: writing a "perfect" SQL query that runs inexplicably slowly. Using the Apache Pinot distributed database as a practical case study, I will demystify why this happens.
The session dives into how developers unknowingly sabotage their own queries through subtle NULL semantics, innocent-looking functions, and ORM-generated code that breaks the relational algebra rules core to any modern optimizer.
This is a highly practical, "code-and-consequences" session that will give attendees a new mental model for writing performant SQL that applies not just to Pinot, but to any database they use.
Session Details
As developers, we've all been there: you write a SQL query that looks perfectly clean and correct, but it runs unexpectedly slow. You check the execution plan and find the database has chosen a brute-force, inefficient path. This is a problem in single-node databases like SQL Server, but it is even worse in distributed ones.
Apache Pinot is a distributed OLAP database that can run complex SQL expressions (including window functions and joins) in milliseconds across billions of rows. However, its optimizer applies complex relational algebra transformations while remaining blind to business invariants known only to humans (and sometimes not even them).
This means that a seemingly harmless function call, an innocent offset or limit, the subtle properties of NULL handling, or a slight change in a JOIN condition can unknowingly prevent these transformations, forcing the engine to abandon powerful optimizations. To make matters worse, ORMs like Hibernate are notorious for generating queries that fall into these exact traps.
While this talk uses Apache Pinot for its examples, the concepts are universal. Whether you're using ClickHouse, SQL Server, or even NoSQL databases like MongoDB, the principles for writing optimizer-friendly queries remain the same. Using concrete, real-world examples, I will demonstrate how seemingly equivalent SQL queries can produce wildly different execution plans. You'll leave this talk armed with the knowledge to spot these "optimization killers" and craft queries that work with the database, not against it, ensuring your analytics are always blazingly fast.
Apache Pinot is a distributed OLAP database that can run complex SQL expressions (including window functions and joins) in milliseconds across billions of rows. However, its optimizer applies complex relational algebra transformations while remaining blind to business invariants known only to humans (and sometimes not even them).
This means that a seemingly harmless function call, an innocent offset or limit, the subtle properties of NULL handling, or a slight change in a JOIN condition can unknowingly prevent these transformations, forcing the engine to abandon powerful optimizations. To make matters worse, ORMs like Hibernate are notorious for generating queries that fall into these exact traps.
While this talk uses Apache Pinot for its examples, the concepts are universal. Whether you're using ClickHouse, SQL Server, or even NoSQL databases like MongoDB, the principles for writing optimizer-friendly queries remain the same. Using concrete, real-world examples, I will demonstrate how seemingly equivalent SQL queries can produce wildly different execution plans. You'll leave this talk armed with the knowledge to spot these "optimization killers" and craft queries that work with the database, not against it, ensuring your analytics are always blazingly fast.
3 things you'll get out of this session
- Spot and refactor common but problematic SQL patterns
- Examples from Apache Pinot but applicable to other distributed systems
- See SQL from the database's perspective as a declarative structure that can be transformed through relational algebra
- Examples from Apache Pinot but applicable to other distributed systems
- See SQL from the database's perspective as a declarative structure that can be transformed through relational algebra