Agents in Production 2025
# Agents in Production
# Voice AI
# Coval
# OpenAI
Building Real-Time, Reliable Voice AI: From Simulation to Production // Brooke Hopkins | Peter Bakkum // Agents in Production 2025
Join Brooke Hopkins (Founder & CEO, Coval) and Peter Bakkum (API Multimodal Lead, OpenAI) for an insightful fireside chat focused on the cutting-edge voice-to-voice architectures powering modern voice AI applications. They’ll unpack the unique challenges of designing and deploying real-time, multimodal systems that enable seamless, natural conversations between users and AI agents. Drawing from Brooke’s expertise in simulation and evaluation at scale and Peter’s experience building OpenAI’s real-time APIs, this conversation will dive into how infrastructure, latency optimization, and rigorous testing come together to create reliable, production-ready voice AI experiences.


Brooke Hopkins & Peter Bakkum · Jul 30th, 2025

Erik Goron · Jul 30th, 2025
Voice agents live or die on latency and trust. I’ll share how HappyRobot’s MLOps pipeline turns raw production audio into high-accuracy, low-latency models: 1. Synthetic labels first: we generate large-scale annotations with reasoning LLMs. 2. Human in the loop: a targeted subset of samples are reviewed by human annotators to correct drift and refine prompts (DSPy-style). 3. Distill & specialize: small, domain-tuned models are fine-tuned via LoRA/distillation. We’ll walk through our MLOps stack. From observability to AI-assisted data generation and model fine-tuning / optimization.
# Agents in Production
# Voice Agents
# LLM
# HappyRobot

Shai Rubin · Jul 30th, 2025
Everyone is building AI agents, but at their core is the LLM—and choosing the right one is critical. With new models launching every week, each promising game-changing productivity, how do we make informed, data-driven choices?"" In this talk, I’ll focus on LLM selection for a critical agent skill: code understanding. I’ll present a study applying 15 leading LLMs to real-world code summarization tasks, using practical, agent-relevant metrics like verbosity, latency, cost, human-aligned accuracy, and information gain. We’ll explore how these models actually perform in practice, beyond benchmarks and hype, and what that means for building effective, capable agents. Whether you’re building autonomous coding assistants, dev-focused copilots, or multi-modal agent systems, choosing the right LLM isn’t optional—it’s the foundation. This talk aims to cut through the noise and offer actionable insights to help you select the best model for your agent’s real-world success.
# Agents in Production
# LLM
# Smart Agents
# Strudel AI

Jonas Scholz · Jul 30th, 2025
If There's Free Compute, There's Abuse" - Fighting Fraud with Lightweight LLM Agents. At Sliplane, a managed container hosting platform, we offer developers an easy way to deploy their apps — with the unfortunate side effect of attracting abuse. From crypto miners to phishing kits, attackers love free trials. Traditional heuristics can detect DDoS or mining, but phishing and spam bots often hide behind innocuous-looking code. In this lightning talk, I’ll share how we built a lightweight AI agent pipeline that inspects user repositories when abuse is suspected. It runs on a self-hosted, small Mistral model and applies natural language policies to flag suspicious behavior, such as fake login pages or spam scripts. The agent picks and summarizes key files, checks them against soft rules, and helps a human-in-the-loop make faster decisions, all without sending private customer code to a third-party API. This talk goes into the architecture, cost optimizations to run small models, and why this beats rule-based filters at scale.
# Agents in Production
# Free Compute
# LLM
# Fraud
# Sliplane

Mohamed Rashad · Jul 30th, 2025
In the fast advancing world of AI systems, cost optimization isn't just a luxury, but it can break your company. As these systems scale and complexity grows, keeping expenses in check across infrastructure, model serving, latency management, and resource allocation becomes increasingly challenging. In this talk, we'll go through the critical challenges surrounding cost management in multi-agent environments. We'll dive into real-world solutions, highlighting hybrid deployment strategies (cloud and on-prem), precise cost-tracking tools, and resource optimization tactics that have delivered significant savings without compromising performance. We will also cover real use cases from our experience in the last decade and actionable insights drawn from practical experiences.
# Agents in Production
# Cost Optimization
# DevisionX

Meryem Arik · Jul 30th, 2025
Generative AI and Agentic AI hold the potential to revolutionize everyday business operations. However, for highly regulated enterprises, security and privacy are non-negotiable and shared LLM API services often aren't appropriate. In this session, we will explore the open-source landscape and identify various applications where owning your own stack can lead to enhanced data privacy and security, greater customization, and cost savings in the long run. Our talk will take you through the entire process, from idea to implementation, guiding you through selecting the right model, deploying it on a suitable infrastructure, and ultimately building a robust AI agent. By the end of this session, attendees will gain practical insights to enhance there ability to develop high value Generative AI applications. You will leave with a deeper understanding on how to empower your organization with self-hosted solutions that prioritize control, customization, and compliance.
# Agents in Production
# Idea to Implementation
# Self-host
# Doubleword

Valliappa Lakshmanan · Jul 28th, 2025
If your goal is to accelerate development of agentic systems without sacrificing production quality, a great choice is to use simple, composable GenAI patterns and off-the-shelf tools for monitoring, logging, and a few other capabilities. In this talk, I'll present an architecture consisting of such patterns that will enable you to build agentic systems in a way that does not lock you into any LLM, cloud, or agent framework. The patterns I talk about are from my GenAI design patterns book which is in early release on O'Reilly's platform.
# Agents in Production
# Agents Frameworks

Devin Stein · Jul 28th, 2025
Agent memory and organizational knowledge are actually one in the same. Organizations remember the whys, and how-tos by writing down learnings in a centralize place. For agents to remember, they need to do to the same. But they both suffer the same shortcoming - it's impossible to keep up-to-date. Can we solve both problems at once?
# Agents in Production
# Facts Flywheel
# Dosu

Shraddha Yeole · Jul 28th, 2025
It’s 2 a.m., and a critical service slows down. Dashboards scream red—packet loss, timeouts, delays. The clock is ticking. Eyes race across a maze of graphs, flipping through visualizations and route tables. One graph leads to another. A dozen tabs open. Fatigue sets in. You’re left guessing: Is it the network, the application, or something else? Welcome to the new normal in network operations—where telemetry is endless, but clarity is rare. This session explores how AI and large language models (LLMs) transform observability by evolving views from data presentation to intelligent data interpretation. Instead of manually piecing together clues, imagine asking, “What’s wrong here?” and receiving clear, contextual insights. AI-powered storytelling augments human reasoning, reduces noise, and accelerates fault isolation—lowering misdiagnosis risk and improving mean time to identify (MTTI) and resolve (MTTR). Join us to see how storytelling is reshaping digital operations.
# Agents in Production
# AI-augmented Troubleshooting
# ThousandEyes

Erica Hughberg · Jul 28th, 2025
AI agents aren't just generating content; they're generating traffic. Like any good agent, your AI agent isn’t working alone. Behind the scenes is a mission-critical handler: the AI Gateway. In this lightning talk, we'll explore how Gateways are evolving to handle the evolving realities of GenAI: dynamic routing, access control, cost-aware load balancing, model-aware failover, and observability across multi-model environments. If you're building agents or just trying to keep up with the traffic they generate, this talk will help you understand the infrastructure patterns that are evolving to support a new landscape of software.
# Agents in Production
# Hidden Infrastructure
# Tetrate

Colin McNamara · Jul 28th, 2025
This talk shares the ongoing, real-world journey of building agentic infrastructure at AlwaysCool.ai—from simple GPT-based tools to our first production-ready AI microservices. We started with small wins like automating nutritional analysis and FDA label validation, but quickly ran into issues with sync limits, cost control, and debugging blind spots. That led us to build a shared agentic service layer, using LangGraph to orchestrate multi-step flows and FastAPI to serve those agents cleanly. With OpenTelemetry at the core, we now send metrics and traces to Prometheus, Grafana, and LangSmith for real-time visibility, which is critical for compliance workflows such as HACCP, CAPA, and FDA traceability. We’re not claiming to have it all figured out—this is a story of learning in the open, much like we do at the Austin AI Middleware Users Group (AIMUG). If you're navigating the same terrain—tooling decisions, observability gaps, or production pressure—this talk offers patterns, tools, and cautionary lessons worth carrying into your own journey.
# Agents in Production
# Console Scripts
# Agentic Services
# Always Cool.AI