Agents in Production 2025
# Agents in Production
# Reinforcing Learning
# Kentauros AI
How Reinforcement Learning can Improve your Agent // Patrick Barker // Agents in Production 2025
In this talk we will bring to light the open secret in the AI community: most agents don't work reliably. We'll explore the most common ways agents fail, highlighting how fundamental issues with the model often can't be overcome with prompting. If this is true, then why aren't we correcting the paths in the model? Reinforcement learning offers the most promising path to reliable agents. Designing reward signals is the future of agentic development. In the next few years we will transition from agents that are programed deterministically, to agents that are taught interactively. We don't need to be stuck in the ice age of frozen models, we can take our agents to the next level of success.

Patrick Barker · Aug 6th, 2025

Ryan Fox-Tyler · Aug 6th, 2025
What happens when you empower AI agents to design, configure, and deploy other agents? At Hypermode, we put this question to the test by developing Concierge—an agent that acts as both architect and orchestrator, assembling custom agent workflows on demand. In this session, I’ll share the technical journey behind building Concierge, our “agent that builds agents,” and how it’s reshaping the way teams approach automation and task completion. Key topics will include: The architecture and design patterns enabling agent creation How Concierge leverages natural language and user intent to assemble tailored agent teams Real-world challenges: managing reliability, evaluation, and guardrails when agents are in charge Lessons learned from deploying agent-built agents in production environments The future of agentic systems: towards self-improving, self-deploying AI teams
# Agents in Production
# Agents hiring teams
# Hypermode

Adam Sroka · Aug 6th, 2025
What happens when AI agents stop just suggesting next steps – and start running the project? In this talk, Dr. Adam Sroka (CEO & Co-Founder, Hypercube) shares the learnings behind Jellyfish, an agentic AI platform designed to manage the end-to-end lifecycle of renewable energy assets and deliver 50% efficiency gains. Built to replace time-consuming, manual work done by project managers, analysts, and engineers, Jellyfish combines proprietary AI models with multi-agent workflows to automate planning, data collation, reporting, and real-time analysis – with human oversight built in. Adam will break down the system architecture, from workflow design and automation strategies to real-time analytics and user accessibility. He’ll also touch on the commercialization path, including IP considerations, and the broader role platforms like this will play in accelerating net-zero targets.
# Agents in Production
# LLM
# Energy Lifecycle
# Hypercube consulting

Allegra Guinan · Aug 6th, 2025
Voice agents are increasingly handling our most sensitive data, from healthcare records to financial information. We inherently trust voices more than text; a psychological bias that creates a unique responsibility: we must design voice agents that honor the trust users naturally place in them. This talk explores how thoughtful design choices shape responsible voice AI deployment. We'll examine how interface design affects meaningful consent, how conversation flows impact privacy, and how voice patterns influence trust. Drawing from real-world examples, we'll cover practical design principles for voice agents handling sensitive data. As voice becomes the primary interface for AI systems, getting these design fundamentals right isn't just good UX, it's an ethical imperative.
# Agents in Production
# Voice Agents
# GuardionAI
# Lumiera

Robert Caulk · Aug 6th, 2025
Synthetic data plays an important role in the news ecosystem. Publishers are now monetizing a synthetic version of their data to help feed news-hungry agents in the wild. We discuss how grounded synthetic news data not only protects publishers against copyright infringement, but also reduces hallucination rates for broad agents built to use hundreds of tools. As agents become better and better generalists, the data that they retrieve via tool-use needs to be packed up and ""Context Engineered"" for quality and ease of consumption. The ancient adage was never more relevant ""Quality in -> Quality out"". Enter the world's largest news knowledge graph. A perfectly searchable, highly accurate, news context delivery machine - geared for high-stakes decision making agentic tasks far and wide. Some tasks include fact-checking, geopolitical risk analysis, event forecasts for prediction markets, and much much more.
# Agents in Production
# Synthetic Data
# Publishing
# Ask News

Simba Khadder · Aug 6th, 2025
Agents are only as useful as the data they can access. EnrichMCP turns your existing data models, like SQLAlchemy schemas, into an agent-ready MCP server. It exposes type-checked, callable methods that agents can discover, reason about, and invoke directly. In this session, we’ll connect EnrichMCP to a live database, run real agent queries, and walk through how it builds a semantic interface over your data. We’ll cover relationship navigation (like user to orders to products), how input and output are validated with Pydantic, and how to extend the server with custom logic or non-SQL sources. Finally, we’ll discuss performance, security, and how to bring this pattern into production.
# Agents in Production
# Enriching MCP
# Featureform

Shahul Elavakkattil Shereef · Aug 6th, 2025
If you’re an engineer building AI Agents, you probably know how hard it is to consistently improve them. But I think it’s not that hard—if you have the right mental framework to solve the problem. That framework is Eval-Driven Development—a fancy way of applying the scientific method to building ML systems. Fundamentally, it’s about iterating on ML systems using science (EDD) rather than art (vibe checks). In this session, we’ll explore how one can use the ideas of experimentation and evaluation to improve any AI agents consistently. We’ll also learn how to use LLMs as effective proxies for human judgment (evals) and build a data flywheel for improving its alignment, choose the right metrics, and set feedback loops from production to identify and improve long-tail scenarios.
# Agents in Production
# EDD
# AI Agents
# Ragas

Dexter Horthy · Aug 6th, 2025
Hi, I'm Dex. I've been hacking on AI agents for a while. I've tried every agent framework out there, from the plug-and-play crew/langchains to the "minimalist" smolagents of the world to the "production grade" langraph, griptape, etc. I've talked to a lot of really strong founders who are all building really impressive things with AI. Most of them are rolling the stack themselves. I don't see a lot of frameworks in production customer-facing agents. I've been surprised to find that most of the products out there billing themselves as "AI Agents" are not all that agentic. A lot of them are mostly deterministic code, with LLM steps sprinkled in at just the right points to make the experience truly magical. Agents, at least the good ones, don't follow the "here's your prompt, here's a bag of tools, loop until you hit the goal" pattern. Rather, they are comprised of mostly just software. So, I set out to answer: What are the principles we can use to build LLM-powered software that is actually good enough to put in the hands of production customers?
# Agents in Production
# 12 Factor Agents
# HumanLayer

Stephanie Kirmer · Aug 6th, 2025
Evaluating LLM performance is vital to successfully deploying AI to production settings. Unlike regular machine learning where you can measure accuracy or error rates, with text generation you're dealing with something much more subjective, and need to find ways to quantify the quality. As we combine LLMs together and add other tools in the agentic context, this becomes even more challenging, requiring robust evaluation techniques. In this talk I propose an approach to this evaluation that borrows from academic evaluation - namely, creating clear rubrics that spell out what success looks like in as close to an objective fashion as possible. Armed with these, we can deploy additional tested LLMs to conduct evaluation. The result is highly efficient and solves much of the evaluation dilemma, although there are still gaps that I will also discuss. (This is an adaptation of an article I wrote: https://towardsdatascience.com/evaluating-llms-for-inference-or-lessons-from-teaching-for-machine-learning)
# Agents in Production
# Evaluating LLMS
# DataGrail

Hakan Tek · Aug 4th, 2025
This talk explores how AI can be integrated into legacy systems without complete rewrites. Through real-world examples, we’ll cover practical methods for adding AI-driven features like automation and predictive insights on top of existing infrastructures. The focus is on low-disruption, high-impact integration strategies for businesses adapting to an AI-driven future.
# Agents in Production
# AI Integration
# Digital Data

Pierre Gerardi · Aug 4th, 2025
Is there value in implementing AI in your business processes? Undeniably! But as more AI solutions are deployed, the maintenance burden increases, and it becomes harder for users to find and use the right tools. The marginal benefits of single-point agents are limited, real gains come from creating a platform where different agents can collaborate and perform tasks autonomously. At the Port of Antwerp-Bruges, we've developed APICA, a multi-agent platform. APICA acts as a single digital colleague integrated into Microsoft Teams. Users interact through one familiar interface, while behind the scenes, APICA coordinates specialized AI agents to handle complex tasks. In this talk, we’ll share how we built APICA, what the architecture looks like, and how agents collaborate. We'll walk through a case study of our nautical agent, which processes maritime SQL data to answer questions. Finally, we’ll provide practical insights into the challenges we faced and what we’re still working to improve.
# Agents in Production
# Apica
# Superlinear