MLOps Community

Agents in Production - Prosus x MLOps

# Agents in Production
# Prosus Group
# MCP Security

MCP Security: What Happens When Your Agents Talk to Everything? // Rosemary Nwosu Ihueze

MCP lets your agents connect to Slack, GitHub, your database, and whatever else you throw at it. Great for productivity. Terrible for security. When an agent can call any tool through any protocol, you've got a problem: who's actually making the request? What can they access? And when something breaks—or gets exploited—how do you even trace it back? This talk covers what breaks when agents go multi-protocol: authentication that doesn't account for agent delegation, permission models designed for humans not bots, and audit trails that disappear when Agent A spawns Agent B to call Tool C. I'll walk through real attack scenarios—prompt injection leading to unauthorized API calls, credential leakage across protocol boundaries, and privilege escalation through tool chaining. Then we'll dig into what actually works: identity verification at protocol boundaries, granular permissions that follow context, not just credentials, and audit systems built for non-human actors. You'll leave knowing how to implement MCP without turning your agent system into an attack surface, and what to build (or demand from vendors) to keep agent-to-tool communication secure.
Rosemary Nwosu-Ihueze
Rosemary Nwosu-Ihueze · Dec 19th, 2025
All
Tom Kaltofen
Tom Kaltofen · Dec 11th, 2025
Modern AI agents depend on vast amounts of context, data, features, and intermediate states, to make correct decisions. In practice, this context is often tied to specific datasets or infrastructure, leading to brittle pipelines and unpredictable behaviour when agents move from prototypes to production. This talk introduces mloda, an open‑source Python framework that makes data, feature, and context engineering shareable. By separating what you compute from how you compute it, mloda provides the missing abstraction layer for AI pipelines, allowing teams to build deterministic context layers that agents can rely on. Attendees will learn how mloda's plugin‑based architecture (minimal dependencies, BYOB design) enables clean separation of transformation logic from execution environments. We'll explore how built‑in input/output validation and test‑driven development will help you build strong contexts. The session will demonstrate how mloda can generate production‑ready data flows. Real‑world examples will show how mloda enables deterministic context layers from laptop prototypes to cloud deployments.
# Agents in Production
# Prosus Group
# Context Layers
Sam  Partee
Sam Partee · Dec 10th, 2025
Building agentic tools for production requires far more than a simple chatbot interface. The real value comes from agents that can reliably take action at scale, integrate with core systems, and execute tasks through secure, controlled workflows. Yet most agentic tools never make it to production. Teams run into issues like strict security requirements, infrastructure complexity, latency constraints, high operational costs, and inconsistent behavior. To understand what it takes to ship production-grade agents, let's break down the key requirements one by one.
# Agents in Production
# Prosus Group
# Agentic Tools
Stop thinking of `POST /predict` when someone says ""serving AI"". At Delivery Hero, we've rethought Gen AI infrastructure from the ground up, with async message queues, actor-model microservices, and zero-to-infinity autoscaling - no orchestrators, no waste, no surprising GPU bills. Here's the paradigm shift: treat every AI step as an independent async actor (we call them ""asyas""). Data ingestion? One asya. Prompt construction? Another. Smart model routing? Another. Pre-processing, analysis, backend logic, even agents — dozens of specialized actors coexist on the same GPU cluster and talk to each other, each scaling from zero to whatever capacity you need. The result? Dramatically lower GPU costs, true composability, and a maintainable system that actually matches how AI workloads behave. We'll show the evolution of our project - DAGs to distributed stateless async actors - and demonstrate how naturally this architecture serves real-world production needs. The framework is open-source as `Asya`. If time permits, we'll also discuss bridging these async pipelines with synchronous MCP servers when real-time responses are required. Come see why async isn't an optimization — it's a paradigm shift for AI infrastructure.
# Agents in Production
# Prosus Group
# AI Drift
In this session, we’ll explore how developing and deploying AI-driven agents demands a fundamentally new testing paradigm—and how scalable simulations deliver the reliability, safety and human-feel that production-grade agents require. You’ll learn how simulations allow you to: - Mirror messy real-world user behavior (multiple languages, emotional states, background noise) rather than scripting narrow “happy-path” dialogues. - Model full conversation stacks including voice: turn-taking, background noise, accents, and latency – not just text messages. - Embed automated simulation suites into your CI/CD pipeline so that every change to your agent is validated before going live. - Assess multiple dimensions of agent performance—goal completion, brand-compliance, empathy, edge-case handling—and continuously guard against regressions. - Scale from “works in demo” to “works for every customer scenario” and maintain quality as your agent grows in tasks, languages or domains. Whether you’re building chat, voice, or multi-modal agents, you’ll walk away with actionable strategies for incorporating simulations into your workflow—improving reliability, reducing surprises in production, and enabling your agent to behave as thoughtfully and consistently as a human teammate.
# Agents in Production
# Prosus Group
# Agent Simulations
Developing AI agents for shopping is just the first step; the real challenge is reliably running them in production across complex, mission-critical e-commerce systems—a significant MLOps hurdle. In this talk, we'll talk about Alfred, our agentic orchestration layer. Built with tools like Langgraph, LangFuse, LiteLLM, and Google Cloud components, Alfred is the critical piece that coordinates LLMs with our entire e-commerce backend—from search and recommendations to cart management. It handles the complete execution graph, secured tool calling, and prompt workflow. We’ll share our journey in designing a reusable agent architecture that scales across all our digital properties. We’ll discuss the specifics of our tech stack and productionization methodology, including how we leveraged the MCP framework and our existing platform APIs to accelerate development of Alfred.
# Agents in Production
# Prosus Group
# Agentic Commerce
Single-agent LLM systems fail silently in production - they're confidently wrong at scale with no mechanism for self-correction. We've deployed a multi-agent orchestration pattern called ""structured dissent"" where believer, skeptic, and neutral agents debate decisions before consensus. This isn't theoretical - we'll show production deployment patterns, cost/performance tradeoffs, and measurable reliability improvements. You'll learn when multi-agent architectures justify the overhead, how to orchestrate adversarial agents effectively, and operational patterns for monitoring agent reasoning quality in production. Our first deployment of the debate swarm revolves around MCP servers - we use a security swarm specially built for MCP servers to analyze findings from open source security tools. This provides more nuanced reasoning and gives a confidence score to evaluate the security of unknown MCP tools.
# Agents in Production
# Prosus Group
# Production Reliability
Personalization at scale needs deep understanding of each customer. You must collect data from many sources, read it, reason and infer, plan, decide, act, and write to each person. One agent doing everything gave us poor and inconsistent quality. Multi-agent systems changed that. They deliver mass personalization. They also break in edge cases, contradict each other, and are hard to debug. I will share how we addressed this with Cortex UCM, a unified customer memory, and Generative Tables. We map noisy data into a clean, structured layer that agents read and write. We began with email for both outbound and inbound communication. Then we personalized websites and product pages for e-commerce at scale. I share customer stories. For example, one customer had over 60,000 product pages that required customization for thousands of communities and product offerings. I will present our decentralized shared-memory orchestration briefly and how it stays transparent and debuggable. It opens safe paths for external agents. What failed. What worked. What we are building next.
# Agents in Production
# Prosus Group
# Multi-Agent Personalities
Matt Sharp
Matt Sharp · Nov 27th, 2025
There's never been a better time to be a hacker. With the explosion of vibe-coded solutions full of vulnerabilities and the power and ease that LLMs and Agents lend to hackers, we are seeing an increase in attacks. This talk dives into several vulnerabilities that agent systems have introduced and how they are already being exploited.
# Agents in Production
# Prosus Group
# Hacker
We reflect on how the complexity of an agent analytics project at an international pharma taught us to move from prompt engineering to context engineering, empowering agent with interactive tooling to build their context dynamically.
# Agents in Production
# Prosus Group
# Enterprise Analytics
Santoshkalyan  Rayadhurgam
Santoshkalyan Rayadhurgam · Nov 27th, 2025
Search is still the front door of most digital products—and it’s brittle. Keyword heuristics and static ranking pipelines struggle with messy, ambiguous queries. Traditionally, fixing this meant years of hand-engineering and expensive labeling. Large language models change that equation: they let us deploy agents that act like search engineers—rewriting queries, disambiguating intent, and even judging relevance on the fly. In this talk, I’ll show how to put these agents to work in real production systems. We’ll look at simple but powerful patterns—query rewriting, hybrid retrieval, agent-based reranking—and what actually happens when you deploy them at scale. You’ll hear about the wins, the pitfalls, and the open questions. The goal: to leave you with a practical playbook for how agents can make search smarter, faster, and more adaptive—without turning your system into a black box.
# Agents in Production
# Prosus Group
# Search Engine Agents
Code of Conduct