22-25 April 2026

Why RAG Fails for Structured Data: Text2SQL approach for structured data

Regular 50 minute session for SQLBits 2026

TL; DR

While LLMs excel at handling unstructured data through RAG, structured data like SQL tables and CSV files requires a different approach where schema relationships and precise queries matter more than semantic similarity. This talk demonstrates building a Data Analytics Agent using Python that translates natural language into executable Pandas or SQL code. We'll cover why chunk-based RAG fails for tabular data, how to architect Text-2-SQL systems with LLMs, and practical evaluation methods for these workflows.

Session Details

LLMs handle unstructured data well through RAG, but structured data like SQL tables and CSV files need a different approach. This talk demonstrates building a Data Analytics Agent that converts natural language into executable Python Pandas or SQL code. We'll cover how to architect effective Text-2-Command (SQL/Pandas) systems at production, taking finance sales as an example, why chunk-based RAG retrieval doesn't work for tabular data, and practical methods to evaluate generated queries. If you are someone working with SQL/CSV and LLM and want to integrate them, this talk is for you.

Attendees will learn when to choose Text-2-Command over RAG and how to use Python REPL tools to execute SQL/Pandas code for analytics and visualization based on structured data user queries.

3 things you'll get out of this session

- Understand the fundamental difference between unstructured and structured data from an LLM system design perspective
- Identify when Text to Command systems like SQL or Pandas are a better architectural choice than RAG
- Understand common failure modes in LLM-generated SQL or Pandas and how to evaluate correctness