Apache Iceberg Data Analytics with Python
Proposed session for SQLBits 2026TL; DR
In this talk, we’ll explore how to unlock the full potential of Iceberg for data analytics in Python. From working with PyIceberg for low-level Iceberg table operations to leveraging Apache Polaris for catalog management, we’ll cover the essential tools and libraries available for Python users.
Session Details
Python continues to be a leading language for data analytics, and Apache Iceberg has emerged as a powerful table format for managing large-scale datasets. In this talk, we’ll explore how to unlock the full potential of Iceberg for data analytics in Python. From working with PyIceberg for low-level Iceberg table operations to leveraging Apache Polaris for catalog management, we’ll cover the essential tools and libraries available for Python users.
We’ll also dive into DataFusion, a high-performance query engine that integrates seamlessly with Iceberg, and the dremio-simple-query library, which simplifies querying Iceberg tables through Dremio. This session will provide hands-on examples, best practices, and real-world scenarios to help you harness Python’s flexibility and Iceberg’s scalability for analytics workloads. Whether you’re a data scientist, engineer, or analyst, you’ll leave with practical insights into building a Python-powered data analytics pipeline with Apache Iceberg.
We’ll also dive into DataFusion, a high-performance query engine that integrates seamlessly with Iceberg, and the dremio-simple-query library, which simplifies querying Iceberg tables through Dremio. This session will provide hands-on examples, best practices, and real-world scenarios to help you harness Python’s flexibility and Iceberg’s scalability for analytics workloads. Whether you’re a data scientist, engineer, or analyst, you’ll leave with practical insights into building a Python-powered data analytics pipeline with Apache Iceberg.
3 things you'll get out of this session
Learn about Iceberg and using it with Python
Speakers
Alex Merced's other proposed sessions for 2026
SQL For Everything: Structure, Unstructured, SQLServer, Onelake - 2026