Agents in Production - Prosus x MLOps
# Agents in Production
# Prosus Group
# Agent Simulations
Simulate to Scale: How realistic simulations power reliable agents in production // Sachi Shah
In this session, we’ll explore how developing and deploying AI-driven agents demands a fundamentally new testing paradigm—and how scalable simulations deliver the reliability, safety and human-feel that production-grade agents require. You’ll learn how simulations allow you to: - Mirror messy real-world user behavior (multiple languages, emotional states, background noise) rather than scripting narrow “happy-path” dialogues. - Model full conversation stacks including voice: turn-taking, background noise, accents, and latency – not just text messages. - Embed automated simulation suites into your CI/CD pipeline so that every change to your agent is validated before going live. - Assess multiple dimensions of agent performance—goal completion, brand-compliance, empathy, edge-case handling—and continuously guard against regressions. - Scale from “works in demo” to “works for every customer scenario” and maintain quality as your agent grows in tasks, languages or domains. Whether you’re building chat, voice, or multi-modal agents, you’ll walk away with actionable strategies for incorporating simulations into your workflow—improving reliability, reducing surprises in production, and enabling your agent to behave as thoughtfully and consistently as a human teammate.

Sachi Shah · Dec 3rd, 2025

Mefta Sadat · Nov 27th, 2025
Developing AI agents for shopping is just the first step; the real challenge is reliably running them in production across complex, mission-critical e-commerce systems—a significant MLOps hurdle. In this talk, we'll talk about Alfred, our agentic orchestration layer. Built with tools like Langgraph, LangFuse, LiteLLM, and Google Cloud components, Alfred is the critical piece that coordinates LLMs with our entire e-commerce backend—from search and recommendations to cart management. It handles the complete execution graph, secured tool calling, and prompt workflow. We’ll share our journey in designing a reusable agent architecture that scales across all our digital properties. We’ll discuss the specifics of our tech stack and productionization methodology, including how we leveraged the MCP framework and our existing platform APIs to accelerate development of Alfred.
# Agents in Production
# Prosus Group
# Agentic Commerce

Phil Stafford · Nov 27th, 2025
Single-agent LLM systems fail silently in production - they're confidently wrong at scale with no mechanism for self-correction. We've deployed a multi-agent orchestration pattern called ""structured dissent"" where believer, skeptic, and neutral agents debate decisions before consensus. This isn't theoretical - we'll show production deployment patterns, cost/performance tradeoffs, and measurable reliability improvements. You'll learn when multi-agent architectures justify the overhead, how to orchestrate adversarial agents effectively, and operational patterns for monitoring agent reasoning quality in production. Our first deployment of the debate swarm revolves around MCP servers - we use a security swarm specially built for MCP servers to analyze findings from open source security tools. This provides more nuanced reasoning and gives a confidence score to evaluate the security of unknown MCP tools.
# Agents in Production
# Prosus Group
# Production Reliability

Hamed Taheri · Nov 27th, 2025
Personalization at scale needs deep understanding of each customer. You must collect data from many sources, read it, reason and infer, plan, decide, act, and write to each person. One agent doing everything gave us poor and inconsistent quality. Multi-agent systems changed that. They deliver mass personalization. They also break in edge cases, contradict each other, and are hard to debug. I will share how we addressed this with Cortex UCM, a unified customer memory, and Generative Tables. We map noisy data into a clean, structured layer that agents read and write. We began with email for both outbound and inbound communication. Then we personalized websites and product pages for e-commerce at scale. I share customer stories. For example, one customer had over 60,000 product pages that required customization for thousands of communities and product offerings. I will present our decentralized shared-memory orchestration briefly and how it stays transparent and debuggable. It opens safe paths for external agents. What failed. What worked. What we are building next.
# Agents in Production
# Prosus Group
# Multi-Agent Personalities

Matt Sharp · Nov 27th, 2025
There's never been a better time to be a hacker. With the explosion of vibe-coded solutions full of vulnerabilities and the power and ease that LLMs and Agents lend to hackers, we are seeing an increase in attacks. This talk dives into several vulnerabilities that agent systems have introduced and how they are already being exploited.
# Agents in Production
# Prosus Group
# Hacker

Dirk Petzoldt · Nov 27th, 2025
We reflect on how the complexity of an agent analytics project at an international pharma taught us to move from prompt engineering to context engineering, empowering agent with interactive tooling to build their context dynamically.
# Agents in Production
# Prosus Group
# Enterprise Analytics

Santoshkalyan Rayadhurgam · Nov 27th, 2025
Search is still the front door of most digital products—and it’s brittle. Keyword heuristics and static ranking pipelines struggle with messy, ambiguous queries. Traditionally, fixing this meant years of hand-engineering and expensive labeling. Large language models change that equation: they let us deploy agents that act like search engineers—rewriting queries, disambiguating intent, and even judging relevance on the fly. In this talk, I’ll show how to put these agents to work in real production systems. We’ll look at simple but powerful patterns—query rewriting, hybrid retrieval, agent-based reranking—and what actually happens when you deploy them at scale. You’ll hear about the wins, the pitfalls, and the open questions. The goal: to leave you with a practical playbook for how agents can make search smarter, faster, and more adaptive—without turning your system into a black box.
# Agents in Production
# Prosus Group
# Search Engine Agents

Benjamin Guo · Nov 27th, 2025
I went viral for spending $9k on Cursor in 1 month, and I wrote half a million lines of code building Zo Computer over the last 4 months. In this talk, I'll share everything I would've wanted to know when I was just getting started. Coding with AI done right is *leveraged thinking*, and I'll share the workflows we use on our team to ship high-quality code at breakneck speed.
# Agents in Production
# Prosus Group
# Velocity Coding

Rekha Singhal · Nov 27th, 2025
Traditionally, an enterprise needs to go under transformation for business processes or IT system due to the arrival of any internal (such as mergers and new service) or external (e.g. new technology like GenAI, pandemic etc.) disruptor. This led to a long catch-up time for an enterprise. We can design our applications to be resilient to these disruptions using adaptable-IT systems, by leveraging future advancements in computing, and scalable to envision zero-downtime. Today’s AI based applications may involve traditional computing, SQL processing , deep learning and Gen AI model inference, so are heterogenous in their demands for computing and memory bandwidth. Some of the applications, like molecular simulation and portfolio optimizations are intractable and not solvable even using traditional compute. Also, the workloads on these applications keep varying throughout their life cycle. On the deployment side,, the range of computing accelerators today extends beyond traditional general purpose computing to low power AI specialized hardware such as Inferentia, Graphcore, Sambanova and Cerebras, many of which are accessible on public and private clouds, and special hardware like Quantum computer for intractable problems. Further, due to death of Moore’s law and need of power hungry AI models, there is shift from silicon based computing to physics based computing for AI applications, such as photonics, neuromorphic, analog and DNA computing paradigms. Achieving the optimal balance of latency, throughput, cost, and energy requires large design space exploration across hardware architectures. This motivates us to build intelligent middleware for pareto optimal mapping of application components to heterogenous hardware ecosystem. This asset can recommend high performance low cost, low energy deployment options for enterprise applications.
# Agents in Production
# Prosus Group
# Agentic AI Systems

Benjamin Hindman · Nov 27th, 2025
Recently, we’ve heard smart people we respect say things like: “There's a lot of buzz around MCP. I'm not convinced it needs to exist.” In this talk, we will argue that MCP differs from other modern protocols like gRPC and HTTP primarily due to its inherent statefulness. And, that this statefulness is required to bring about the rich, intelligent, long-lived, human-like communication that we want with AI. It is a feature, not a bug. Consider a human-to-human phone conversation: you're on the phone, the call drops, and you dial back. You don’t expect to begin your conversation all over again. Instead, you'd anticipate resuming from where you left off, or at least very close to it. To achieve something similar from human-to-computer, the protocol needs ways to pick up where the chat left off. More so than schemas and a clean separation of prompts, resources, and tools, this is what excites us the most about MCP. This talk will explore the aspects of MCP that enable fault-tolerance as well as features that have remained relatively obscure like elicitation (which enables the MCP server to ask questions or elicit feedback from humans), and sampling (which enables the MCP server to invoke the LLM itself). We will also discuss the current MCP client landscape as well as some important SEPs (Specification Enhancement Proposals) coming down the pike, like SEP-1391, in which we have been involved.
# Agents in Production
# Prosus Group
# MCP

Quinten Rosseel · Nov 27th, 2025
When your head of data goes on paternity leave, you learn whether your AI agent actually works. For a logistics SaaS company with a 2.5-person data team, our AI analyst ""Wobby"" became the unexpected backup, handling 60% of incoming data questions from the business. This talk shares the hard-won lessons from taking an AI agent from concept to daily use. You'll learn why we abandoned our web UI for Slack, why BIRD benchmark scores meant nothing for our actual success, and how we built an eval system that caught real failure modes instead of synthetic ones. We'll cover the technical decisions that mattered: context engineering, metadata design, and latency optimization. We'll also cover the non-technical ones that mattered more: channel design, user onboarding, and building trust with skeptical business users. This is a practitioner's guide to agent deployment. What worked, what failed spectacularly, and what we'd do differently next time.
# Agents in Production
# Prosus Group
# Agents actually in Production

