MLOps Community
+00:00 GMT

Collections

All Collections

Agents in Production - Prosus x MLOps
27 Items

All Content

All
Sachi  Shah
Sachi Shah · Dec 3rd, 2025
Simulate to Scale: How realistic simulations power reliable agents in production // Sachi Shah
In this session, we’ll explore how developing and deploying AI-driven agents demands a fundamentally new testing paradigm—and how scalable simulations deliver the reliability, safety and human-feel that production-grade agents require. You’ll learn how simulations allow you to: - Mirror messy real-world user behavior (multiple languages, emotional states, background noise) rather than scripting narrow “happy-path” dialogues. - Model full conversation stacks including voice: turn-taking, background noise, accents, and latency – not just text messages. - Embed automated simulation suites into your CI/CD pipeline so that every change to your agent is validated before going live. - Assess multiple dimensions of agent performance—goal completion, brand-compliance, empathy, edge-case handling—and continuously guard against regressions. - Scale from “works in demo” to “works for every customer scenario” and maintain quality as your agent grows in tasks, languages or domains. Whether you’re building chat, voice, or multi-modal agents, you’ll walk away with actionable strategies for incorporating simulations into your workflow—improving reliability, reducing surprises in production, and enabling your agent to behave as thoughtfully and consistently as a human teammate.
# Agents in Production
# Prosus Group
# Agent Simulations
Comment
Médéric Hurier
Médéric Hurier · Dec 2nd, 2025
Overcome the friction of boilerplate code and infrastructure wrangling by adopting a declarative approach to AI agent development. This article introduces Ackgent, a production-ready template built on Google’s Agent Developer Kit (ADK) and Agent Config, which allows developers to define agent behaviors via structured YAML files while keeping implementation logic in Python. Learn how to leverage a modern stack—including uv, just, and Multi-agent Communication Protocol (MCP)—to rapidly prototype, test, and deploy scalable multi-agent systems on Google Cloud Run.
# AI Agents
# Generative AI Agents
# Artificial Intelligence
# Google ADK
# Data Science
Jon Saad-Falcon
Jimin (Anna) Yoon
Arthur Coleman
Jon Saad-Falcon, Jimin (Anna) Yoon & Arthur Coleman · Dec 1st, 2025
Language models are getting better at reasoning but their ability to verify their own outputs still lags behind. This paper tackles that challenge head-on by introducing Weaver, a framework that combines multiple weak verifiers into a single, stronger verifier without relying heavily on labeled data. ​Weaver uses weak supervision to estimate verifier reliability, normalize inconsistent outputs, and filter low-quality signals, resulting in a unified score that better reflects true response quality. In practice, this approach significantly boosts reasoning and math task performance rivaling models several times larger, such as achieving o3-mini-level accuracy using only Llama 3.3 70B as the generator.
# LLM Verification
# Weak Verifiers
# RAG Systems
Developing AI agents for shopping is just the first step; the real challenge is reliably running them in production across complex, mission-critical e-commerce systems—a significant MLOps hurdle. In this talk, we'll talk about Alfred, our agentic orchestration layer. Built with tools like Langgraph, LangFuse, LiteLLM, and Google Cloud components, Alfred is the critical piece that coordinates LLMs with our entire e-commerce backend—from search and recommendations to cart management. It handles the complete execution graph, secured tool calling, and prompt workflow. We’ll share our journey in designing a reusable agent architecture that scales across all our digital properties. We’ll discuss the specifics of our tech stack and productionization methodology, including how we leveraged the MCP framework and our existing platform APIs to accelerate development of Alfred.
# Agents in Production
# Prosus Group
# Agentic Commerce
Comment
Single-agent LLM systems fail silently in production - they're confidently wrong at scale with no mechanism for self-correction. We've deployed a multi-agent orchestration pattern called ""structured dissent"" where believer, skeptic, and neutral agents debate decisions before consensus. This isn't theoretical - we'll show production deployment patterns, cost/performance tradeoffs, and measurable reliability improvements. You'll learn when multi-agent architectures justify the overhead, how to orchestrate adversarial agents effectively, and operational patterns for monitoring agent reasoning quality in production. Our first deployment of the debate swarm revolves around MCP servers - we use a security swarm specially built for MCP servers to analyze findings from open source security tools. This provides more nuanced reasoning and gives a confidence score to evaluate the security of unknown MCP tools.
# Agents in Production
# Prosus Group
# Production Reliability
Comment
Personalization at scale needs deep understanding of each customer. You must collect data from many sources, read it, reason and infer, plan, decide, act, and write to each person. One agent doing everything gave us poor and inconsistent quality. Multi-agent systems changed that. They deliver mass personalization. They also break in edge cases, contradict each other, and are hard to debug. I will share how we addressed this with Cortex UCM, a unified customer memory, and Generative Tables. We map noisy data into a clean, structured layer that agents read and write. We began with email for both outbound and inbound communication. Then we personalized websites and product pages for e-commerce at scale. I share customer stories. For example, one customer had over 60,000 product pages that required customization for thousands of communities and product offerings. I will present our decentralized shared-memory orchestration briefly and how it stays transparent and debuggable. It opens safe paths for external agents. What failed. What worked. What we are building next.
# Agents in Production
# Prosus Group
# Multi-Agent Personalities
Comment
Matt Sharp
Matt Sharp · Nov 27th, 2025
There's never been a better time to be a hacker. With the explosion of vibe-coded solutions full of vulnerabilities and the power and ease that LLMs and Agents lend to hackers, we are seeing an increase in attacks. This talk dives into several vulnerabilities that agent systems have introduced and how they are already being exploited.
# Agents in Production
# Prosus Group
# Hacker
Comment
We reflect on how the complexity of an agent analytics project at an international pharma taught us to move from prompt engineering to context engineering, empowering agent with interactive tooling to build their context dynamically.
# Agents in Production
# Prosus Group
# Enterprise Analytics
Comment
Santoshkalyan  Rayadhurgam
Santoshkalyan Rayadhurgam · Nov 27th, 2025
Search is still the front door of most digital products—and it’s brittle. Keyword heuristics and static ranking pipelines struggle with messy, ambiguous queries. Traditionally, fixing this meant years of hand-engineering and expensive labeling. Large language models change that equation: they let us deploy agents that act like search engineers—rewriting queries, disambiguating intent, and even judging relevance on the fly. In this talk, I’ll show how to put these agents to work in real production systems. We’ll look at simple but powerful patterns—query rewriting, hybrid retrieval, agent-based reranking—and what actually happens when you deploy them at scale. You’ll hear about the wins, the pitfalls, and the open questions. The goal: to leave you with a practical playbook for how agents can make search smarter, faster, and more adaptive—without turning your system into a black box.
# Agents in Production
# Prosus Group
# Search Engine Agents
Comment
I went viral for spending $9k on Cursor in 1 month, and I wrote half a million lines of code building Zo Computer over the last 4 months. In this talk, I'll share everything I would've wanted to know when I was just getting started. Coding with AI done right is *leveraged thinking*, and I'll share the workflows we use on our team to ship high-quality code at breakneck speed.
# Agents in Production
# Prosus Group
# Velocity Coding
Comment
Code of Conduct