LIVESTREAM

Agents in Production 2025

AI Agents Are Already Working — Let’s Talk About It

Agents are no longer just experiments. From e-commerce to customer support to analytics workflows, they’re quietly getting real work done in production.

On July 17, join the MLOps Community for Part 2 of Agents in Production — a virtual event focused on the messy, practical side of building and deploying AI agents.

What’s on deck?

Taming complexity: agent memory, behavior control, latency vs. response tradeoffs
Stories from the field: How companies are actually using agents in live environments
Tooling that works: routing, evaluation, UX, and cost performance in the wild

It’s free, it’s global, and it’s going to be packed.

Speakers

Dipanwita Mallick

Principal Product Manager @ HP

Philipp Schmid

AI Developer Experience @ Google DeepMind

Shraddha Yeole

Senior Software Engineer, Machine learning @ ThousandEyes part of Cisco

Allegra Guinan

Co-founder @ Lumiera

Yuki Watanabe

Sr. Software Engineer @ Databricks

Madison Kanna

Growth Engineer @ Baseten

Sarah Gebauer

Founder and Physician @ Validara Health

Venkata Gopi Kolla

Lead Software Engineer @ Salesforce Inc

Surendra Narang

Senior Manager Cyber Security @ PaloAlto Networks

Meryem Arik

CEO / Co-founder @ Doubleword

Krista Opsahl-Ong

Research Engineer @ Databricks

Gal Peretz

Head of AI @ Carbyne

Tula Masterman

Principal AI Agent Solutions Architect @ MeshAgent

Lars Maaløe

Co-founder & CTO @ Corti

Mariana Prazeres

AI Engineer @ Run the eval loop

Stephanie Kirmer

Senior Machine Learning Engineer @ DataGrail

Erica Hughberg

Community Advocate @ Tetrate

Annie Condon

AI Solutions Engineer @ Acre Security

Jeff Groom

Director of Engineering, AI @ Acre Security

Maria Zhang

CEO & Co-Founder @ Palona AI

Somya Rai

Principal AI Engineer @ EXL

Aurimas Griciūnas

Founder & CEO @ SwirlAI

Valliappa Lakshmanan

Operating Executive @ N/A

Robert Caulk

C*O @ AskNews

Diego Oppenheimer

Head of Product @ Hyperparam

Shahul Elavakkattil Shereef

Co-founder & CTO @ Ragas

Simba Khadder

Founder & CEO @ Featureform

Nimrod Busany

Founder & Chief Scientist @ Traigent

Patrick Barker

CTO @ Kentauros AI

Edward Upton

Founding Engineer @ Asteroid

Advait Patel

Senior Site Reliability Engineer @ Broadcom

Dexter Horthy

Founder @ HumanLayer

Ben Labaschin

Principal Machine Learning Engineer @ Workhelix

Jonas Scholz

Co-founder @ Sliplane

Vrushank Vyas

Dev Rel / GTM @ Portkey

Shai Rubin

CTO and Co-founder @ Strudel AI

Pierre Gerardi

MLOps Team Lead & Senior Machine Learning Engineer @ Superlinear

Hakan Tek

Full-stack Developer @ Digital Data GmbH

Erik Goron

ML / AI Engineer @ Happyrobot

Devin Stein

Founder & CEO @ Dosu

Adam Sroka

CEO & Co-founder @ Hypercube Consulting

Mohamed Rashad

Co-founder & CTO @ DevisionX

Tanmay Tiwari

Senior Software Full Stack Engineer @ Rivian via BayOne

Ryan Fox-Tyler

Co-founder and SVP Product/Engineering @ Hypermode

Colin McNamara

Co-founder; Managing Partner for Engineering @ Always Cool Brands | Always Cool AI

David de la Iglesia Castro

AI Engineer @ Mozilla.ai

Renato Byrro

Machine Experience Engineer @ Arcade.dev

Brooke Hopkins

Founder @ Coval

Peter Bakkum

Member of Technical Staff, Multimodal API Lead @ Open AI

Demetrios Brinkmann

Chief Happiness Engineer @ MLOps Community

Agenda

Stage 1

Stage 2

Stage 3

3:40 PM, GMT

3:45 PM, GMT

Opening / Closing

Welcome Note

3:50 PM, GMT

4:15 PM, GMT

Keynote

The Future of Compute: How AI Agents Are Reshaping Infrastructure

The rapid evolution of AI agents is exposing a widening gap between their unique computational needs and today’s infrastructure. This keynote cuts through the hype to highlight why traditional compute paradigms—mainframes, VMs, containers, even serverless—are struggling to keep up with agents’ bursty, stateful, and hardware-hungry workloads. We’ll examine the economic and technical inefficiencies organizations face, from unpredictable scaling to persistent state management, and why simply “tweaking the cloud” won’t cut it. Expect a candid look at the real operational challenges, the architectural dead-ends, and the tough question: do we adapt existing frameworks, or is it time for a radical rethink of how we design and manage compute for the AI era? Actionable insights, not wishful thinking.

+ Read More

4:20 PM, GMT

4:45 PM, GMT

Presentation

Beyond Chatbots: How to build Agentic AI systems with Google Gemini

As AI continues to evolve, we will a shift from static chatbots to dynamic agentic AI systems capable of autonomous reasoning, tool integration, and multi-step problem-solving. This talk explores how to design AI agents that leverage structured outputs, function calling, and workflow orchestration with Google Gemini.

+ Read More

4:50 PM, GMT

5:00 PM, GMT

Lightning Talk

How to Build Execution Layers That Don’t Burn Out

We’ve all built tools that looked good in demos but broke the moment we let go of the wheel. This talk is about building something different—systems that don’t just run, but run well when no one’s watching.

In under 10 minutes, I’ll walk through how I designed an execution layer that handles thousands of operations daily—without melting under pressure, without drifting off course, and without needing constant supervision.

I’ll share the real structure: • How it thinks • How it decides what to do next • How it knows when to stop • And how it stays sane when things go wrong

No theory, no fluff—just what it took to make something dependable in a world that isn’t.

+ Read More

5:05 PM, GMT

5:30 PM, GMT

Presentation

From Guesswork to Greatness: Systematic AI Agent Optimization in Production

Every engineer building AI agents has experienced it: you tweak a prompt, swap out a model, or adjust a RAG setting—only to find it either worsens the agent or improves one aspect while breaking another. Why does this happen? Because teams typically test just one configuration out of countless possible combinations, hoping for the best.

Current evaluation tools are built for single-point assessments, not the extensive multi-dimensional comparisons that real-world scenarios demand. Sure, you might be able to A/B test two prompts or select from a few models, but exploring hundreds of configurations across dimensions like cost, latency, and accuracy simultaneously is nearly impossible.

In this talk, we'll demonstrate how adopting a structured approach to testing alternatives can significantly change outcomes. Leveraging concepts from multi-objective optimization, we’ll illustrate how Traigent's SDK and UI empower engineers to allocate their testing budgets effectively. Traigent intelligently identifies and explores promising configurations, highlighting optimal tradeoffs. You'll learn how this methodology can yield quality improvements of 4–7x and reduce costs by up to 90%, all without resorting to guesswork or manual trial-and-error.

+ Read More

5:35 PM, GMT

6:00 PM, GMT

Panel Discussion

Underwriting Assist - A Multi Agent System

Underwriting Assist is a LangChain- and Ray-powered multi-agent system that accelerates insurance underwriting by 3x and cuts manual errors by 40%. It leverages RAG, shared memory, and LLM-based agents for clause analysis, risk profiling, and rationale generation. Real-time evals and human-in-loop feedback ensure accuracy, explainability, and regulatory compliance at scale.

+ Read More

6:05 PM, GMT

6:15 PM, GMT

Break

Swag Giveaway

6:15 PM, GMT

6:40 PM, GMT

Presentation

Building Agents for Healthcare

Healthcare is one of the most vital and far-reaching sectors in our society, touching every individual at some point in their lives. Yet, it faces mounting challenges: rising administrative burdens, increasingly complex disease patterns, and growing patient volumes strain already stretched systems. In this talk, Lars will explore the untapped potential of AI agents to address some of healthcare’s most pressing real-world problems. He will present Corti’s unique approach to developing domain-specific agents equipped with healthcare-relevant skills—engineered not only for impact, but within a framework that places governance and safety at its core. Join us to learn how AI can be responsibly and powerfully deployed to support the future of care.

+ Read More

6:45 PM, GMT

7:10 PM, GMT

Presentation

How to Stop AI Agents from Bleeding Your Cloud Budget

As AI agents become active participants in production environments, handling infrastructure tasks, chaining tools, generating outputs, and executing plans across cloud services, the financial implications are often underestimated. These agents may appear intelligent, but they have zero awareness of cost boundaries. A single agent loop with poorly bounded retries, excessive API calls, or unrestricted tool usage can quietly rack up hundreds or even thousands of compute, token, or storage costs.

In this session, I’ll walk through how seemingly harmless design decisions, like overly verbose prompts, excessive tool chaining, or unrestricted LLM usage, can result in runaway spending. I’ll share lessons from deploying agentic systems in cloud-native pipelines and infrastructure security tools, including my work on DockSec, an open-source AI-powered container security analyzer. We’ll explore how agents misbehave in cloud billing terms and what the attendees can do to stop it.

Attendees will learn practical strategies to monitor, contain, and optimize agent costs: from integrating cost observability into your agent stack, to programmatically setting retry, token, and API call budgets, to leveraging agent memory, caching, and behavior throttling to reduce waste. Whether they’re scaling agents in production or just starting to build them, this talk will give them the tools to design agent systems that are not only intelligent but also financially sustainable.

+ Read More

7:15 PM, GMT

7:40 PM, GMT

Presentation

Machine Experience Engineering

Machine Experience (MX) Engineering is similar to UX Engineering, but focused on creating interfaces that efficiently address AI models' needs and are easy for them to understand.

Existing APIs were developed for human software engineers. If we want LLMs to be reliable when calling tools/functions, we have to develop interfaces that are tailored for their reasoning model.

+ Read More

7:45 PM, GMT

7:55 PM, GMT

Break

Musical Jamz

7:55 PM, GMT

8:20 PM, GMT

Presentation

The Facts Flywheel

Agent memory and organizational knowledge are actually one in the same. Organizations remember the whys, and how-tos by writing down learnings in a centralize place. For agents to remember, they need to do to the same.

But they both suffer the same shortcoming - it's impossible to keep up-to-date. Can we solve both problems at once?

+ Read More

8:25 PM, GMT

8:50 PM, GMT

Presentation

Too much lock-in for too little gain: agent frameworks are a dead-end

If your goal is to accelerate development of agentic systems without sacrificing production quality, a great choice is to use simple, composable GenAI patterns and off-the-shelf tools for monitoring, logging, and a few other capabilities. In this talk, I'll present an architecture consisting of such patterns that will enable you to build agentic systems in a way that does not lock you into any LLM, cloud, or agent framework. The patterns I talk about are from my GenAI design patterns book which is in early release on O'Reilly's platform.

+ Read More

8:55 PM, GMT

9:05 PM, GMT

Lightning Talk

Evaluating AI Agents: Why It Matters and How We Do It

As we integrate agentic AI into business products, robust evaluation of the agents is essential to delivering the highest quality. Proper evaluation ensures that AI agents are reliable, safe, effective, and aligned with user intent. Unlike traditional software or machine learning models, AI agents are non-deterministic and require specific types of evaluation. This talk outlines the importance of evaluating AI agents, the key components that we version and test at Acre Security, the metrics that matter for different types of agents, and how we currently achieve success evaluating AI agents that we build at Acre.

+ Read More

9:10 PM, GMT

9:20 PM, GMT

Lightning Talk

From Spikes to Stories: AI-Augmented Troubleshooting in the Network Wild

it’s 2 a.m., and a critical service slows down. Dashboards scream red—packet loss, timeouts, delays. The clock is ticking. Eyes race across a maze of graphs, flipping through visualizations and route tables. One graph leads to another. A dozen tabs open. Fatigue sets in. You’re left guessing: Is it the network, the application, or something else? Welcome to the new normal in network operations—where telemetry is endless, but clarity is rare. This session explores how AI and large language models (LLMs) transform observability by evolving views from data presentation to intelligent data interpretation. Instead of manually piecing together clues, imagine asking, “What’s wrong here?” and receiving clear, contextual insights. AI-powered storytelling augments human reasoning, reduces noise, and accelerates fault isolation—lowering misdiagnosis risk and improving mean time to identify (MTTI) and resolve (MTTR). Join us to see how storytelling is reshaping digital operations.

+ Read More

9:25 PM, GMT

9:35 PM, GMT

Break

Meme Showdown

9:35 PM, GMT

9:45 PM, GMT

Lightning Talk

The Hidden Infrastructure Behind Every AI Agent

AI agents aren't just generating content; they're generating traffic. Like any good agent, your AI agent isn’t working alone. Behind the scenes is a mission-critical handler: the AI Gateway.

In this lightning talk, we'll explore how Gateways are evolving to handle the evolving realities of GenAI: dynamic routing, access control, cost-aware load balancing, model-aware failover, and observability across multi-model environments.

If you're building agents or just trying to keep up with the traffic they generate, this talk will help you understand the infrastructure patterns that are evolving to support a new landscape of software.

+ Read More

9:50 PM, GMT

10:00 PM, GMT

Lightning Talk

From Console Scripts to Agentic Services: Building Observability into Everyday LLM Workflows

This talk shares the ongoing, real-world journey of building agentic infrastructure at AlwaysCool.ai—from simple GPT-based tools to our first production-ready AI microservices. We started with small wins like automating nutritional analysis and FDA label validation, but quickly ran into issues with sync limits, cost control, and debugging blind spots.

That led us to build a shared agentic service layer, using LangGraph to orchestrate multi-step flows and FastAPI to serve those agents cleanly. With OpenTelemetry at the core, we now send metrics and traces to Prometheus, Grafana, and LangSmith for real-time visibility, which is critical for compliance workflows such as HACCP, CAPA, and FDA traceability.

We’re not claiming to have it all figured out—this is a story of learning in the open, much like we do at the Austin AI Middleware Users Group (AIMUG). If you're navigating the same terrain—tooling decisions, observability gaps, or production pressure—this talk offers patterns, tools, and cautionary lessons worth carrying into your own journey.

+ Read More

10:05 PM, GMT

10:30 PM, GMT

Presentation

Driving Evaluation-Driven Development with MLflow 3.0

Quality is the top barrier preventing Agentic applications from reaching production. This talk introduces Evaluation-Driven Development, a methodology that uses evaluation as the cornerstone for building high-quality, reliable Agentic systems. We will demonstrate how to drive it with MLflow 3.0, a new generation of the popular MLOps platform redesigned for the LLM era, including one-line observability, automatic evaluation, human-in-the-loop feedback loops, and monitoring.

+ Read More

Agents in Production 2025

AI Agents Are Already Working — Let’s Talk About It

It’s free, it’s global, and it’s going to be packed.

Speakers

Agenda

Sponsors