MLOps Community
Content
/
Content Collections
/
Agents in Production 2025

Agents in Production 2025

Popular topics
# LLMs
# LLM in Production
# Agents in Production
# AI Agents
# AI
# LLM
# Machine Learning
# MLOps
# Rungalileo.io
# MLops
# RAG
# Prosus Group
# Generative AI
# Interview
# Machine learning
# Tecton.ai
# Arize.com
# mckinsey.com/quantumblack
# Redis.io
# Zilliz.com
Video

The Future of Compute: How AI Agents Are Reshaping Infrastructure // Diego Oppenheimer - Keynote // Agents in Production 2025

The rapid evolution of AI agents is exposing a widening gap between their unique computational needs and today’s infrastructure. This keynote cuts through the hype to highlight why traditional compute paradigms—mainframes, VMs, containers, even serverless—are struggling to keep up with agents’ bursty, stateful, and hardware-hungry workloads. We’ll examine the economic and technical inefficiencies organizations face, from unpredictable scaling to persistent state management, and why simply “tweaking the cloud” won’t cut it. Expect a candid look at the real operational challenges, the architectural dead-ends, and the tough question: do we adapt existing frameworks, or is it time for a radical rethink of how we design and manage compute for the AI era? Actionable insights, not wishful thinking.
# Agents in Production
# Hyperparam
# AI Agents
# AI infrastructure
Diego Oppenheimer
Diego Oppenheimer · Jul 23rd, 2025
Comment
38:34
Video

Driving Evaluation-Driven Development with MLflow 3.0 // Yuki Watanabe // Agents in Production 2025

Quality is the top barrier preventing Agentic applications from reaching production. This talk introduces Evaluation-Driven Development, a methodology that uses evaluation as the cornerstone for building high-quality, reliable Agentic systems. We will demonstrate how to drive it with MLflow 3.0, a new generation of the popular MLOps platform redesigned for the LLM era, including one-line observability, automatic evaluation, human-in-the-loop feedback loops, and monitoring. Huge shout out to our sponsors @Databricks
# Agents in Production
# Databricks
# MLFLOW
Yuki  Watanabe
Yuki Watanabe · Jul 23rd, 2025
Comment
28:13
Video

Beyond Chatbots: How to build Agentic AI systems with Google Gemini // Philipp Schmid // Agents in Production 2025

As AI continues to evolve, we will a shift from static chatbots to dynamic agentic AI systems capable of autonomous reasoning, tool integration, and multi-step problem-solving. This talk explores how to design AI agents that leverage structured outputs, function calling, and workflow orchestration with Google Gemini.
# Agents in Production
# Google DeepMind
# Chatbots
# Google Gemini
Philipp Schmid
Philipp Schmid · Jul 23rd, 2025
Comment
27:17
Video

Advancing the Cost-Quality Frontier in Agentic AI // Krista Opsahl-Ong // Agents in Production 2025

Enterprises love the promise of AI agents, but most projects stall in an endless loop of manual prompt tweaks, ambiguous evaluations, and ballooning inference costs. Agent Bricks—Databricks’ new agent builder platform—solves these pain points by turning a simple task description and data source into a production-ready, domain-specific agent that is optimized for both cost & quality. In this talk, we’ll give an overview of what Agent Bricks are, how you can get started with them, and discuss some of the research that is powering them.
# Agents in Production
# Databricks
# Agentic AI
Krista  Opsahl-Ong
Krista Opsahl-Ong · Jul 23rd, 2025
Comment
12:45
Video

How to Build Execution Layers That Don’t Burn Out // Tanmay Tiwari // Agents in Production 2025

We’ve all built tools that looked good in demos but broke the moment we let go of the wheel. This talk is about building something different—systems that don’t just run, but run well when no one’s watching. In under 10 minutes, I’ll walk through how I designed an execution layer that handles thousands of operations daily—without melting under pressure, without drifting off course, and without needing constant supervision. I’ll share the real structure: • How it thinks • How it decides what to do next • How it knows when to stop • And how it stays sane when things go wrong No theory, no fluff—just what it took to make something dependable in a world that isn’t.
# Agents in Production
# Rivian
# Burnout
Tanmay Tiwari
Tanmay Tiwari · Jul 25th, 2025
Comment
10:07
Video

From Guesswork to Greatness: Systematic AI Agent Optimization in Production // Nimrod Busany // Agents in Production 2025

Every engineer building AI agents has experienced it: you tweak a prompt, swap out a model, or adjust a RAG setting—only to find it either worsens the agent or improves one aspect while breaking another. Why does this happen? Because teams typically test just one configuration out of countless possible combinations, hoping for the best. Current evaluation tools are built for single-point assessments, not the extensive multi-dimensional comparisons that real-world scenarios demand. Sure, you might be able to A/B test two prompts or select from a few models, but exploring hundreds of configurations across dimensions like cost, latency, and accuracy simultaneously is nearly impossible. In this talk, we'll demonstrate how adopting a structured approach to testing alternatives can significantly change outcomes. Leveraging concepts from multi-objective optimization, we’ll illustrate how Traigent's SDK and UI empower engineers to allocate their testing budgets effectively. Traigent intelligently identifies and explores promising configurations, highlighting optimal tradeoffs. You'll learn how this methodology can yield quality improvements of 4–7x and reduce costs by up to 90%, all without resorting to guesswork or manual trial-and-error.
# Agents in Production
# AI Optimization
# Traigent
Nimrod Busany
Nimrod Busany · Jul 25th, 2025
Comment
27:06
Video

Underwriting Assist - A Multi Agent System // Somya Rai | Maria Zhang // Agents in Production 2025

Underwriting Assist is a LangChain- and Ray-powered multi-agent system that accelerates insurance underwriting by 3x and cuts manual errors by 40%. It leverages RAG, shared memory, and LLM-based agents for clause analysis, risk profiling, and rationale generation. Real-time evals and human-in-loop feedback ensure accuracy, explainability, and regulatory compliance at scale.
# Multi-Agent System
# EXL
# Palona AI
Somya Rai
Maria Zhang
Somya Rai & Maria Zhang · Jul 25th, 2025
Comment
33:27
Video

Building Agents for Healthcare // Lars Maaløe // Agents in Production 2025

Healthcare is one of the most vital and far-reaching sectors in our society, touching every individual at some point in their lives. Yet, it faces mounting challenges: rising administrative burdens, increasingly complex disease patterns, and growing patient volumes strain already stretched systems. In this talk, Lars will explore the untapped potential of AI agents to address some of healthcare’s most pressing real-world problems. He will present Corti’s unique approach to developing domain-specific agents equipped with healthcare-relevant skills—engineered not only for impact, but within a framework that places governance and safety at its core. Join us to learn how AI can be responsibly and powerfully deployed to support the future of care.
# Agents
# Healthcare
# corti.ai
Lars Maaløe
Lars Maaløe · Jul 25th, 2025
Comment
30:34
Video

How to Stop AI Agents from Bleeding Your Cloud Budget // Advait Patel // Agents in Production 2025

As AI agents become active participants in production environments, handling infrastructure tasks, chaining tools, generating outputs, and executing plans across cloud services, the financial implications are often underestimated. These agents may appear intelligent, but they have zero awareness of cost boundaries. A single agent loop with poorly bounded retries, excessive API calls, or unrestricted tool usage can quietly rack up hundreds or even thousands of compute, token, or storage costs. In this session, I’ll walk through how seemingly harmless design decisions, like overly verbose prompts, excessive tool chaining, or unrestricted LLM usage, can result in runaway spending. I’ll share lessons from deploying agentic systems in cloud-native pipelines and infrastructure security tools, including my work on DockSec, an open-source AI-powered container security analyzer. We’ll explore how agents misbehave in cloud billing terms and what the attendees can do to stop it. Attendees will learn practical strategies to monitor, contain, and optimize agent costs: from integrating cost observability into your agent stack, to programmatically setting retry, token, and API call budgets, to leveraging agent memory, caching, and behavior throttling to reduce waste. Whether they’re scaling agents in production or just starting to build them, this talk will give them the tools to design agent systems that are not only intelligent but also financially sustainable.
# Agents in Production
# Cloud Budget
# Broadcom
Advait Patel
Advait Patel · Jul 25th, 2025
Comment
31:05
Video

Machine Experience Engineering // Renato Byrro // Agents in Production 2025

Machine Experience (MX) Engineering is similar to UX Engineering, but focused on creating interfaces that efficiently address AI models' needs and are easy for them to understand. Existing APIs were developed for human software engineers. If we want LLMs to be reliable when calling tools/functions, we have to develop interfaces that are tailored for their reasoning model.
# Agents in Production
# Machine Experience Engineering
# Arcade.dev
Renato Byrro
Renato Byrro · Jul 25th, 2025
Comment
27:33
Video

The Facts Flywheel // Devin Stein // Agents in Production 2025

Agent memory and organizational knowledge are actually one in the same. Organizations remember the whys, and how-tos by writing down learnings in a centralize place. For agents to remember, they need to do to the same. But they both suffer the same shortcoming - it's impossible to keep up-to-date. Can we solve both problems at once?
# Agents in Production
# Facts Flywheel
# Dosu
Devin  Stein
Devin Stein · Jul 28th, 2025
Comment
24:33
Video

Too much lock-in for too little gain: agent frameworks are a dead-end // Valliappa Lakshmanan // Agents in Production 2025

If your goal is to accelerate development of agentic systems without sacrificing production quality, a great choice is to use simple, composable GenAI patterns and off-the-shelf tools for monitoring, logging, and a few other capabilities. In this talk, I'll present an architecture consisting of such patterns that will enable you to build agentic systems in a way that does not lock you into any LLM, cloud, or agent framework. The patterns I talk about are from my GenAI design patterns book which is in early release on O'Reilly's platform.
# Agents in Production
# Agents Frameworks
Valliappa Lakshmanan
Valliappa Lakshmanan · Jul 28th, 2025
Comment
35:37
Video

Evaluating AI Agents: Why It Matters and How We Do It // Annie Condon | Jeff Groom // Agents in Production 2025

As we integrate agentic AI into business products, robust evaluation of the agents is essential to delivering the highest quality. Proper evaluation ensures that AI agents are reliable, safe, effective, and aligned with user intent. Unlike traditional software or machine learning models, AI agents are non-deterministic and require specific types of evaluation. This talk outlines the importance of evaluating AI agents, the key components that we version and test at Acre Security, the metrics that matter for different types of agents, and how we currently achieve success evaluating AI agents that we build at Acre.
# Agents in Production
# Evaluating Agents
# Acre Security
Annie Condon
Jeff Groom
Annie Condon & Jeff Groom · Jul 28th, 2025
Comment
13:27
Video

From Spikes to Stories: AI-Augmented Troubleshooting in the Network Wild // Shraddha Yeole // Agents in Production 2025

It’s 2 a.m., and a critical service slows down. Dashboards scream red—packet loss, timeouts, delays. The clock is ticking. Eyes race across a maze of graphs, flipping through visualizations and route tables. One graph leads to another. A dozen tabs open. Fatigue sets in. You’re left guessing: Is it the network, the application, or something else? Welcome to the new normal in network operations—where telemetry is endless, but clarity is rare. This session explores how AI and large language models (LLMs) transform observability by evolving views from data presentation to intelligent data interpretation. Instead of manually piecing together clues, imagine asking, “What’s wrong here?” and receiving clear, contextual insights. AI-powered storytelling augments human reasoning, reduces noise, and accelerates fault isolation—lowering misdiagnosis risk and improving mean time to identify (MTTI) and resolve (MTTR). Join us to see how storytelling is reshaping digital operations.
# Agents in Production
# AI-augmented Troubleshooting
# ThousandEyes
Shraddha  Yeole
Shraddha Yeole · Jul 28th, 2025
Comment
11:49
Video

The Hidden Infrastructure Behind Every AI Agent // Erice Hughberg // Agents in Production 2025

AI agents aren't just generating content; they're generating traffic. Like any good agent, your AI agent isn’t working alone. Behind the scenes is a mission-critical handler: the AI Gateway. In this lightning talk, we'll explore how Gateways are evolving to handle the evolving realities of GenAI: dynamic routing, access control, cost-aware load balancing, model-aware failover, and observability across multi-model environments. If you're building agents or just trying to keep up with the traffic they generate, this talk will help you understand the infrastructure patterns that are evolving to support a new landscape of software.
# Agents in Production
# Hidden Infrastructure
# Tetrate
Erica Hughberg
Erica Hughberg · Jul 28th, 2025
Comment
16:16
Video

From Console Scripts to Agentic Services: Building Observability into Everyday LLM Workflows // Colin McNamara // Agents in Production 2025

This talk shares the ongoing, real-world journey of building agentic infrastructure at AlwaysCool.ai—from simple GPT-based tools to our first production-ready AI microservices. We started with small wins like automating nutritional analysis and FDA label validation, but quickly ran into issues with sync limits, cost control, and debugging blind spots. That led us to build a shared agentic service layer, using LangGraph to orchestrate multi-step flows and FastAPI to serve those agents cleanly. With OpenTelemetry at the core, we now send metrics and traces to Prometheus, Grafana, and LangSmith for real-time visibility, which is critical for compliance workflows such as HACCP, CAPA, and FDA traceability. We’re not claiming to have it all figured out—this is a story of learning in the open, much like we do at the Austin AI Middleware Users Group (AIMUG). If you're navigating the same terrain—tooling decisions, observability gaps, or production pressure—this talk offers patterns, tools, and cautionary lessons worth carrying into your own journey.
# Agents in Production
# Console Scripts
# Agentic Services
# Always Cool.AI
Colin McNamara
Colin McNamara · Jul 28th, 2025
Comment
15:03
Video

Cost Optimization for Multi-agent systems // Mohamed Rashad // Agents in Production 2025

In the fast advancing world of AI systems, cost optimization isn't just a luxury, but it can break your company. As these systems scale and complexity grows, keeping expenses in check across infrastructure, model serving, latency management, and resource allocation becomes increasingly challenging. In this talk, we'll go through the critical challenges surrounding cost management in multi-agent environments. We'll dive into real-world solutions, highlighting hybrid deployment strategies (cloud and on-prem), precise cost-tracking tools, and resource optimization tactics that have delivered significant savings without compromising performance. We will also cover real use cases from our experience in the last decade and actionable insights drawn from practical experiences.
# Agents in Production
# Cost Optimization
# DevisionX
Mohamed Rashad
Mohamed Rashad · Jul 30th, 2025
Comment
12:36
Video

Smart Agents Start with Smart LLM Choices // Shai Rubin // Agents in Production 2025

Everyone is building AI agents, but at their core is the LLM—and choosing the right one is critical. With new models launching every week, each promising game-changing productivity, how do we make informed, data-driven choices?"" In this talk, I’ll focus on LLM selection for a critical agent skill: code understanding. I’ll present a study applying 15 leading LLMs to real-world code summarization tasks, using practical, agent-relevant metrics like verbosity, latency, cost, human-aligned accuracy, and information gain. We’ll explore how these models actually perform in practice, beyond benchmarks and hype, and what that means for building effective, capable agents. Whether you’re building autonomous coding assistants, dev-focused copilots, or multi-modal agent systems, choosing the right LLM isn’t optional—it’s the foundation. This talk aims to cut through the noise and offer actionable insights to help you select the best model for your agent’s real-world success.
# Agents in Production
# LLM
# Smart Agents
# Strudel AI
Shai Rubin
Shai Rubin · Jul 30th, 2025
Comment
25:01
Video

Building Real-Time, Reliable Voice AI: From Simulation to Production // Brooke Hopkins | Peter Bakkum // Agents in Production 2025

Join Brooke Hopkins (Founder & CEO, Coval) and Peter Bakkum (API Multimodal Lead, OpenAI) for an insightful fireside chat focused on the cutting-edge voice-to-voice architectures powering modern voice AI applications. They’ll unpack the unique challenges of designing and deploying real-time, multimodal systems that enable seamless, natural conversations between users and AI agents. Drawing from Brooke’s expertise in simulation and evaluation at scale and Peter’s experience building OpenAI’s real-time APIs, this conversation will dive into how infrastructure, latency optimization, and rigorous testing come together to create reliable, production-ready voice AI experiences.
# Agents in Production
# Voice AI
# Coval
# OpenAI
Brooke Hopkins
Peter Bakkum
Brooke Hopkins & Peter Bakkum · Jul 30th, 2025
Comment
57:13
Video

Fast, Trustworthy and Reliable Voice Agents: MLOps That Blend LLM Annotation with Human QA // Erik Goron // Agents in Production 2025

Voice agents live or die on latency and trust. I’ll share how HappyRobot’s MLOps pipeline turns raw production audio into high-accuracy, low-latency models: 1. Synthetic labels first: we generate large-scale annotations with reasoning LLMs. 2. Human in the loop: a targeted subset of samples are reviewed by human annotators to correct drift and refine prompts (DSPy-style). 3. Distill & specialize: small, domain-tuned models are fine-tuned via LoRA/distillation. We’ll walk through our MLOps stack. From observability to AI-assisted data generation and model fine-tuning / optimization.
# Agents in Production
# Voice Agents
# LLM
# HappyRobot
Erik Goron
Erik Goron · Jul 30th, 2025
Comment
17:08
Video

From Idea to Implementation: How to Self-Host an AI Agent // Meryem Arik // Agents in Production 2025

Generative AI and Agentic AI hold the potential to revolutionize everyday business operations. However, for highly regulated enterprises, security and privacy are non-negotiable and shared LLM API services often aren't appropriate. In this session, we will explore the open-source landscape and identify various applications where owning your own stack can lead to enhanced data privacy and security, greater customization, and cost savings in the long run. Our talk will take you through the entire process, from idea to implementation, guiding you through selecting the right model, deploying it on a suitable infrastructure, and ultimately building a robust AI agent. By the end of this session, attendees will gain practical insights to enhance there ability to develop high value Generative AI applications. You will leave with a deeper understanding on how to empower your organization with self-hosted solutions that prioritize control, customization, and compliance.
# Agents in Production
# Idea to Implementation
# Self-host
# Doubleword
Meryem Arik
Meryem Arik · Jul 30th, 2025
Comment
26:57
Video

If There's Free Compute, There's Abuse - Fighting Fraud with Lightweight LLM Agents // Jonas Scholz // Agents in Production 2025

If There's Free Compute, There's Abuse" - Fighting Fraud with Lightweight LLM Agents. At Sliplane, a managed container hosting platform, we offer developers an easy way to deploy their apps — with the unfortunate side effect of attracting abuse. From crypto miners to phishing kits, attackers love free trials. Traditional heuristics can detect DDoS or mining, but phishing and spam bots often hide behind innocuous-looking code. In this lightning talk, I’ll share how we built a lightweight AI agent pipeline that inspects user repositories when abuse is suspected. It runs on a self-hosted, small Mistral model and applies natural language policies to flag suspicious behavior, such as fake login pages or spam scripts. The agent picks and summarizes key files, checks them against soft rules, and helps a human-in-the-loop make faster decisions, all without sending private customer code to a third-party API. This talk goes into the architecture, cost optimizations to run small models, and why this beats rule-based filters at scale.
# Agents in Production
# Free Compute
# LLM
# Fraud
# Sliplane
Jonas Scholz
Jonas Scholz · Jul 30th, 2025
Comment
15:07
Video

Catastrophic agent failure and how to avoid it // Edward Upton // Agents in Production 2025

Increasing powerful agents are leading to increasingly higher stakes automation. Between fighting fraud at scale using LLM-pipelines to handling healthcare and insurance data with browser agents, I've observed my fair share of consequential agent failures. In this talk I'll share what I've learned about circumnavigating the agent failure landscape.
# Agents in Production
# Agent Failure
# Asteroid AI
Edward Upton
Edward Upton · Aug 1st, 2025
Comment
25:21
Video

The Infrastructure Imperative: What It Takes to Run Multi-Agent Systems // Dipanwita Mallick // Agents in Production 2025

As AI evolves from passive assistants to intelligent agents capable of reasoning, planning, and acting autonomously, the infrastructure supporting them must evolve too. Multi-agent systems require low-latency, scalable, and secure environments that enable real-time coordination, dynamic workloads, and continuous learning—often beyond what traditional cloud setups can deliver. This talk explores the infrastructure blueprint needed to support agentic AI at scale, including hybrid edge-cloud strategies, and data-local compute. Learn what it takes to build a robust foundation for the next generation of intelligent, collaborative agents.
# Agents in Production
# Multi-Agent System
# HP
Dipanwita Mallick
Dipanwita Mallick · Aug 1st, 2025
Comment
18:42
Video

Agents in Scrubs: Designing for the Complex Realities of Healthcare // Sarah Gebauer // Agents in Production 2025

Deploying AI agents in healthcare isn’t just a technical challenge—it’s a clinical one. As a physician working at the intersection of care delivery and machine intelligence, I’ll walk through what it really takes to make agents useful, safe, and credible in high-stakes environments like hospitals and clinics. This talk will focus on: What makes healthcare environments uniquely hard for agents—ambiguity, interruptions, human variability, and risk tolerance Why typical evaluation metrics often miss the mark, and what to measure instead (think: harm reduction, workflow fit, and appropriate escalation) How to scope agent autonomy to reflect the real-world roles of nurses, physicians, and support staff Where agents can shine in augmenting clinical work—and where they’re likely to fail without robust oversight
# Agents in Production
# Agents in Scrubs
# Validara Health
Sarah Gebauer
Sarah Gebauer · Aug 1st, 2025
Comment
16:03
Video

Building Multi-Player AI Systems with MeshAgent // Tula Masterman // Agents in Production 2025

Most AI systems still assume a single human working with one or more agents. In reality, work is a team sport—several people, several agents, one shared goal. MeshAgent turns that reality into software with secure Rooms: on-demand workspaces where every human and agent sees the same live context, abides by access controls, and is fully traceable. In this talk you'll learn how MeshAgent unlocks true multiplayer AI: - Co-create in real time: launch a shared Room where humans and agents collaborate—invite colleagues via link, iterate live, and watch agents work alongside you. - Add new agent teammates in minutes: stand-up chat- or voice-capable agents with the Python, TypeScript, or Dart SDKs, and interact with them in your browser using MeshAgent Studio. - Equip agents to ship actual work: plug in built-in MeshAgent tools or custom Tools so agents do more than just chat. - Go to production as a team, worry-free: MeshAgent owns infra, scaling, logging, and cost dashboards, so your team focuses on outcomes, not ops.
# Agents in Production
# Multiplayer AI Systems
# MeshAgent
Tula Masterman
Tula Masterman · Aug 1st, 2025
Comment
13:53
Video

Managing Memory for AI Agents // Ben Labaschin // Agents in Production 2025

Drawing from my forthcoming publication, I'll explore the foundational decisions that determine whether AI agents deliver lasting value or become expensive technical debt. Rather than focusing on specific frameworks or tools, I'll cover the core tradeoffs between flexibility and performance, the memory patterns that actually matter for agent reliability, and how to architect systems that can evolve with the rapidly changing AI landscape. The key insight is understanding what problems agents fundamentally solve—automating complex, multi-step workflows—and designing memory and coordination systems around those core needs rather than getting caught up in today's specific technologies. You'll leave with a framework for making architectural decisions that will serve you well regardless of which models, frameworks, or tools become dominant next year.
# Agents in Production
# Managing Memory
# Workhelix
Ben Labaschin
Ben Labaschin · Aug 1st, 2025
Comment
29:01
Video

Zero Trust for Multi-Agent Systems // Surendra Narang | Venkata Gopi Kolla // Agents in Production 2025

Don’t trust AI agents. Just because an agent is in your system doesn’t mean it should have overly permissive privileges. Restrict access. Defining a clear role for each agent from the beginning. Give each agent only the tools and information access it really needs. Monitoring of the agent's activity. Keep an eye out for odd behavior or agents that step out of line. The sooner you catch it, the easier it is to fix. Keep things safe without making them slow. Good security shouldn’t get in the way of your agents doing their job. You can have both speed and safety.
# Agents in Production
# Multi-Agent System
# Salesforce
# Paloalto
Surendra Narang
Venkata Gopi Kolla
Surendra Narang & Venkata Gopi Kolla · Aug 1st, 2025
Comment
21:16
Video

A trace is worth a thousand logs // David de la Iglesia Castro // Agents in Production 2025

As AI Agents move from simple chatbots to complex, multi-step autonomous systems, our methods for understanding their behavior must evolve. We'll explore how a single, structured trace—capturing the full chain of thought, tool calls, and model interactions—provides a complete narrative of an agent's execution. These traces can be leveraged to rapidly debug failures, build robust evaluation suites, and create golden datasets for regression testing, ultimately enabling you to build more reliable and predictable agents.
# Agents in Production
# Thousand Logs
# Mozilla.ai
David de la Iglesia Castro
David de la Iglesia Castro · Aug 4th, 2025
Comment
14:43
Video

Voice model performance optimization // Madison Kanna // Agents in Production 2025

Whether you're transcribing a conversation or vocalizing an agent response, STT and TTS models have to run fast. But these modalities introduce new challenges in both runtime performance and network overhead. By optimizing open source models, you can achieve consistent low latencies for both transcription and speech synthesis. In this talk, we'll cover key optimization strategies for Whisper and Orpheus with a focus on real-time workloads, plus a couple of common mistakes to avoid to reduce network overhead.
# Agents in Production
# Voice Agents
# Baseten.co
Madison Kanna
Madison Kanna · Aug 4th, 2025
Comment
16:32
Code of Conduct
Your Privacy Choices