Overview
A workshop on agent-first data systems, agents for data science and analytics, and the future of data systems (part of CAIS 2026, The ACM Conference on AI and Agentic Systems, May 26, 2026, San Jose, California)
Today’s data systems were designed for a small number of careful human operators. But a growing share of analytics, data engineering, and ML workflows is increasingly being delegated to AI agents. This workshop will bring together researchers and practitioners to study how data systems should evolve for agents, and how agents themselves can help shape better systems.
The workshop is born out of recent work in data management, but generalizes to other directions at the intersection of data and agents. Our goal is to explore the full design space where data systems and AI agents meet, and we are open to creative interpretations of the theme.
- Organizers
- Invited Speakers
- Awards
- Panel
- Program
- Important Dates
- Call for Papers
- Accepted Papers
- Contact
Sponsors
Organizers
![]() Elaine Ang Columbia University |
![]() Shu Liu UC Berkeley |
![]() Aditya Parameswaran UC Berkeley |
![]() John Dickerson Mozilla AI |
![]() Jonathan Frankle Databricks |
![]() Jacopo Tagliabue Bauplan Labs |
Invited Speakers
|
Andy Pavlo (left) Carnegie Mellon University |
![]() Aaron Katz ClickHouse |
![]() Nikita Shamgunov Neon / Databricks |
Speaker bios
Andy Pavlo — Andy Pavlo is an Associate Professor with Indefinite Tenure of Databaseology in the Computer Science Department at Carnegie Mellon University. His (unnatural) infatuation with database systems has inadvertently caused him to incur several distinctions, such as IEEE TCDE Ramez Elmasri Outstanding Database Education Award (2026), VLDB Early Career Award (2021), NSF CAREER (2019), Sloan Fellowship (2018), and the ACM SIGMOD Jim Gray Best Dissertation Award (2014). He also was the CEO & co-founder of the OtterTune database tuning start-up (2020-2024), but it died an untimely death. He is currently the CEO and co-founder of “SO-YOU-DONT-HAVE-TO INCORPORATED’); DROP TABLE companies; –” (2025-). Andy earned his Ph.D. in 2013 at Brown University under Stan Zdonik and Mike Stonebraker. He knows some pile about databases.
Aaron Katz — Aaron Katz is currently Co-Founder and CEO of ClickHouse, Inc., the company behind ClickHouse, the industry-leading online analytical processing database management system. With more than 20 years of experience building and leading global teams, Aaron brings a unique perspective with a focus on international business, scale, and distribution. Most recently, Aaron led the GTM efforts at Elastic (NYSE: ESTC) between 2014 and 2020 where he helped grow the company from ~$5M in revenue when he joined to >$500M in revenue as a Section 16 officer when he left. Prior to Elastic, Aaron spent 12 years (2002 - 2014) at salesforce.com (NYSE: CRM) where he held a variety of international sales leadership roles and helped grow the company from a private, ~200 employee startup to a >$200B market leader. Aaron holds a Bachelor’s of Science degree in Managerial Economics from the University of California, Davis and lives in the San Francisco Bay Area with his wife and two children.
Nikita Shamgunov — Nikita Shamgunov is a database systems entrepreneur and engineer, and the co-founder and CEO of Neon, the serverless Postgres company acquired by Databricks in 2025. Before Neon, he co-founded SingleStore, formerly MemSQL, where he served as founding CTO and then CEO, helping build a distributed SQL database for real-time operational and analytical workloads. Earlier in his career, he worked on SQL Server at Microsoft and was a senior engineer at Facebook. Across two decades in database infrastructure, he has worked on systems spanning on-prem engines, distributed SQL, cloud-native databases, and serverless Postgres — and now sits at the intersection of databases and AI-agent workloads.
Awards
We are pleased to offer awards for outstanding contributions:
- MongoDB Best Paper Award — $1,000, recognizing the strongest accepted contribution to the workshop.
- Datadog Best Student Paper Award — $1,000, for the best paper with a student as the primary author.
Awards will be presented at the end of the day during a social gathering with drinks and informal discussion — register for the happy hour to save your spot!
Panel
The workshop will conclude with a panel discussion bringing together different perspectives on agentic data systems, from infrastructure and optimization to safety and deployment. The panel will be moderated by Ciro Greco.
![]() Ashish Kumar MongoDB |
![]() Anant Jhingran IBM Software |
Junaid Ahmed Datadog |
![]() Anupam Datta Snowflake |
Panelist bios
Ashish Kumar — Ashish Kumar is a Technical Fellow at MongoDB, where he focuses on architectural improvements across the company’s product offerings. He joined MongoDB through the acquisition of Grainite, where he was Co-Founder and CEO. At Grainite, he led the development of a first-of-its-kind transactional database unifying native stream storage and parallel processing. Previously, Ashish spent 11 years as a Senior Engineering Director at Google, most recently leading the teams responsible for BigTable, Spanner, Datastore, and Firestore. During his tenure at Google, he also managed teams across Hardware, Display Ads, and Developer Tools. Earlier in his career, Ashish held executive roles at Sun Microsystems and infrastructure startups. He holds a Bachelor’s in Business from SRCC, Delhi University.
Anant Jhingran — Anant Jhingran is CTO for IBM Software, a role he took on when StepZen — the GraphQL API company he co-founded and led as CEO — was acquired by IBM in February 2023. Before StepZen, he helped take Apigee public and through its acquisition by Google. Earlier at IBM, he was an IBM Fellow and CTO of the Information Management Division, shipping products that generated billions in revenue across IBM and Apigee. He holds a PhD in database systems from UC Berkeley and is a Distinguished Alumnus of IIT Delhi, with over a dozen patents and 20+ technical papers to his name.
Junaid Ahmed — Junaid Ahmed is Vice President of Engineering at Datadog, where he leads the Applications pillars of Observability including several AI efforts and helping evolve Datadog’s offering for the agentic future. Before Datadog, he held senior engineering leadership roles as Director of Engineering at Apple and General Manager at Microsoft, working on large-scale problems in search, advertising, recommendations, and deep learning. He is the co-author of several papers including research on “Approximate Nearest Neighbor methods for Dense Text Retrieval” (ICLR 2021) and holds 20+ patents in search ranking, content understanding, and neural information retrieval. Junaid studied at the University of Washington.
Anupam Datta — Anupam Datta is Principal Research Scientist and Snowflake AI Research Lead at Snowflake, which he joined through the acquisition of TruEra in 2024. He was Co-Founder and Chief Scientist of TruEra from 2019 to 2024, building tools for trustworthy AI evaluation and observability. Before TruEra, Anupam was a tenured Professor of Electrical & Computer Engineering and Computer Science at Carnegie Mellon University from 2007 to 2022, where he remains an Adjunct Professor; his research spans trustworthy AI, including evaluation, explainability, fairness, and robustness of ML and GenAI systems. He holds a Ph.D. and M.S. in Computer Science from Stanford University and a B.Tech. in Computer Science and Engineering from IIT Kharagpur.
Program
May 26, 2026 — In person in San Jose, California.
| Time | Activity | Details |
|---|---|---|
| 1:30 – 1:40 PM | Welcome | Introductory remarks |
| 1:40 – 2:20 PM | Keynote | Aaron Katz |
| 2:20 – 3:00 PM | Keynote | Andy Pavlo |
| 3:00 – 3:30 PM | Break | Coffee break |
| 3:30 – 4:30 PM | Contributions | Lightning talks |
| 4:30 – 5:10 PM | Keynote | Nikita Shamgunov |
| 5:10 – 6:00 PM | Panel & closing | Moderated by Ciro Greco |
| 6:30 PM – 9:00 PM | Happy hour, drinks, and awards | Register here |
Important Dates
| Milestone | Date |
|---|---|
| Workshop | Tue, May 26, 2026 |
Call for Papers
We invite submissions on the emerging intersection of AI agents and data systems. Drawing from the workshop manifesto, we are mainly interested in contributions along these research directions:
- Productionizing agentic workloads. Capturing the nuances of agentic reasoning and engineering techniques across the entire data lifecycle.
- Optimizing agent semantics. Exploring the transition from deterministic SQL execution to agent-driven pipelines.
- Data systems for agents. Rethinking core system guarantees from the ground up to support non-human workloads.
- Agents for system design. Exploring the “self-driving” potential of the stack, where agents autonomously design, tune, and maintain the very infrastructure they inhabit.
Examples of topics include, but are not limited to:
- Agent-first OLAP architectures for safe, reproducible, and cost-efficient analytics
- Agentic analytics workflows, including human-in-the-loop patterns and production failure modes
- Evaluation methodologies, benchmarks, and workload traces for data agents
- LLM-assisted optimization and tuning for data systems
- War stories and postmortems from production deployments
- Agentic workflows for data engineering and data science
- Operational reliability for agent-driven automation, including observability, guardrails, governance, and cost controls
- Work-in-progress and early-stage results that showcase novel ideas or promising directions, even if not yet fully evaluated
Submission formats
We solicit:
- Short research and position papers: up to 4 pages plus references
- Late-breaking results / extended abstracts: up to 2 pages plus references
Submissions are reviewed in a single-blind process by members of the committee. We welcome overlapping submissions with other venues (SAO is non-archival). For more background, recent relevant literature, and inspiring use cases, see our workshop proposal.
Program Committee
- Aldrin Montana — Bauplan Labs
- Alperen Keleş — University of Maryland
- Bonnie Xu — OpenAI
- Davide Eynard — Mozilla AI
- Eugene Wu — Columbia University
- Federico Bianchi — Together AI
- Gaetano Rossiello — IBM
- Joseph Axisa — Google
- Nandana Mihindukulasooriya — IBM
- Nicole Rose Schneider — University of Maryland
- Sesh Nalla — Datadog
- Stephanie Wang — MongoDB
- Tao Ye — Lyft
- Till Döhmen — MotherDuck
Accepted Papers
The program features contributions from leading academic and industry organizations, including Stanford, Columbia, NVIDIA, CoreWeave, IBM, MongoDB, Databricks, Bauplan, and many others.
- A Case for Simulation-Driven Resilience in Agent-First Data Systems (paper)
Aleksey Charapko, Murat Demirbas, Akshat Vig - A Query Engine for the Agents (paper)
Kenny Daniel - Agents for Data Streaming Tasks: The Missing Pieces (paper)
Shreesha Gopalakrishna Bhat, Landon Johnson, Michael Noguera, Aishwarya Ganesan, Ramnatthan Alagappan - Autonomous Agent Learning in Production (paper)
Xinhao Cheng, Patrick Coppock, Jianan Ji, Zhihao Jia, Vasilis Kypriotis, Dimitrios Skarlatos, Eliot Solomon, Zhihao Zhang, Yu Zhou - Beyond Semantic Similarity: Performance and Costs of Agentic Retrieval for Complex Tasks (paper)
Reza Esfandiarpoor, Radek Osmulski, Yauhen Babakhin, Gabriel de Souza P. Moreira, Oliver Holworthy, Jie He, Ronay Ak, Jiarui Cai, Ryan Chesler, Bo Liu, Even Oldridge - Beyond the Shell: Extending Agents with Reactive Python Notebooks (paper)
Trevor Manz, Myles Scolnick, Akshay Agrawal - BranchBench: An Extensible Benchmark for Agentic Database Branching (paper)
Elaine Ang, Sam Weldon, In Keun Kim, Kevin Durand, Kostis Kaffes, Eugene Wu - Colloquy (cq): Sharing Failure Modes to Help Agents (paper)
Peter Wilson, Daniel Nissani - Data Journalist Agent: Transforming Data into Trustworthy Multimodal Story (paper)
Kevin Qinghong Lin, Batu EI, Yuhong Shi, Pan Lu, Philip Torr, James Zou - Discovery Agents for Real-Time Analytics: Toward Proactive Insight Systems (paper)
Gaetano Rossiello, Dharmashankar Subramanian - Grounding Agent-Driven Code Optimization in Production Telemetry (paper)
Piotr Bejda, Junaid Ahmed - Lumilake: An Agentic Analytics Engine for AI4Science (paper)
Zhengyuan Su, Noppanat Wadlom, Junyi Shen, Yicong Huang, Wentao Wu, Yao Lu - Metaxy: Field-Level Metadata Management for Incremental Multimodal ML Pipelines (paper)
Daniel Gafni, Georg Heiler - Modular Monoliths: Agentic Analytical Database Architecture (paper)
Giuseppe Mazzotta, SJ Saidi, Mosha Pasumansky, Benjamin Wagner - Parsing Is Not Executing: Decentralized Compliance for Agentic Query Plan Routing (paper)
Ranjan Sinha - Querying Everything Everywhere All at Once (paper)
Jacopo Tagliabue, Aldrin Montana - Sophrosyne: Agentic Exploration of Relational Data Systems Needs Moderation (paper)
Madhav Jivrajani, Ramnatthan Alagappan, Aishwarya Ganesan - TexeraAgent: An AI-Agent for Data Science Using Dataflows (paper)
Jiadong Bai, Yicong Huang, Chen Li - The Hydration Proxy Pattern: Architecting Conversational Data Systems for Stateless LLM APIs (paper)
Joseph Axisa - The Importance of Out-of-Band Metadata for Safe Autonomous Agents: The Redpanda Agentic Data Plane (paper)
Tyler Akidau, Tyler Rockwood, Johannes Brüderl, Marc Millstone - Towards a Context Layer for Self-Improving Data Agents (paper)
Till Döhmen, Jacob Matson, Jordan Tigani - When Agents Outgrow RAG: Building Production Retrieval Systems in the Lakehouse (paper)
Chang She, Prashanth Rao - Workflow, Not Prose: A Multi-Agent Methodology for Data Agent (paper)
Chia-liang Kao, Kent Huang
Contact
For questions, please contact:
Jacopo Tagliabue
jacopo.tagliabue@bauplanlabs.com
Elaine Ang
ra3448@columbia.edu














