22-25 April 2026
Video unavailable
SQLBits 2025

Mind the Gap: Bridging PDFs and SQL Server with AI

Tired of copying data from PDFs into SQL Server? See how to automate this process using AI and PowerShell. We'll explore practical techniques for extracting structured data from PDFs and loading it directly into SQL Server tables.
Every organization has valuable data trapped in PDFs - from invoices to medical records to compliance documents. This session demonstrates a practical solution using OpenAI's Structured Outputs, PowerShell, and SQL Server to automate this tedious process.

Through live demonstrations, I'll show you how to build a reliable pipeline that extracts data from PDFs and loads it directly into SQL Server tables. You'll see real examples using veterinary records, but the techniques apply to any PDF-based data. We'll explore how to handle common challenges like inconsistent formatting and missing data, and discuss strategies for improving accuracy.

The session includes practical demonstrations of:
- Converting PDFs to structured text using AI
- Creating effective JSON schemas for data validation
- Building a PowerShell pipeline for automated processing
- Loading the extracted data into SQL Server

You will learn:
- How to implement OpenAI's Structured Outputs for data extraction
- Techniques for validating and cleaning AI-extracted data
- Methods for handling arrays and nested data structures in PDFs
- Tips for optimizing AI accuracy and reducing processing time
- Best practices for automating PDF-to-SQL workflows

This session is for database professionals looking to automate manual data entry from PDFs. Learn how AI can replace hours of copying and pasting with an automated solution.