MLOps Community

Collections

All Collections

Agents in Production - Prosus x MLOps
29 items

All Content

All
Satish Bhambri
Demetrios Brinkmann
Satish Bhambri & Demetrios Brinkmann · Dec 12th, 2025
Does AgenticRAG Really Work?
Satish Bhambri is a Sr Data Scientist at Walmart Labs, working on large-scale recommendation systems and conversational AI, including RAG-powered GroceryBot agents, vector-search personalization, and transformer-based ad relevance models.
# AgenticRAG
# AI Engineer
# AI Agents
Tom Kaltofen
Tom Kaltofen · Dec 11th, 2025
Modern AI agents depend on vast amounts of context, data, features, and intermediate states, to make correct decisions. In practice, this context is often tied to specific datasets or infrastructure, leading to brittle pipelines and unpredictable behaviour when agents move from prototypes to production. This talk introduces mloda, an open‑source Python framework that makes data, feature, and context engineering shareable. By separating what you compute from how you compute it, mloda provides the missing abstraction layer for AI pipelines, allowing teams to build deterministic context layers that agents can rely on. Attendees will learn how mloda's plugin‑based architecture (minimal dependencies, BYOB design) enables clean separation of transformation logic from execution environments. We'll explore how built‑in input/output validation and test‑driven development will help you build strong contexts. The session will demonstrate how mloda can generate production‑ready data flows. Real‑world examples will show how mloda enables deterministic context layers from laptop prototypes to cloud deployments.
# Agents in Production
# Prosus Group
# Context Layers
Zack Reneau-Wedeen
Demetrios Brinkmann
Zack Reneau-Wedeen & Demetrios Brinkmann · Dec 10th, 2025
Sierra’s Zack Reneau-Wedeen claims we’re building AI all wrong and that “context engineering,” not bigger models, is where the real breakthroughs will come from. In this episode, he and Demetrios Brinkmann unpack why AI behaves more like a moody coworker than traditional software, why testing it with real-world chaos (noise, accents, abuse, even bad mics) matters, and how Sierra’s simulations and model “constellations” aim to fix the industry’s reliability problems. They even argue that decision trees are dead replaced by goals, guardrails, and speculative execution tricks that make voice AI actually usable. Plus: how Sierra trains grads to become product-engineering hybrids, and why obsessing over customers might be the only way AI agents stop disappointing everyone.
# AI Systems
# Agent Simulations
# AI Voice Agent
Kopal Garg
Kopal Garg · Dec 10th, 2025
Everyone obsesses over models, but NVIDIA’s stack makes it obvious: the real power move is owning everything around the model. NeMo trains it, RAPIDS cleans it, TensorRT speeds it up, Triton serves it, Operators manage it — and the hardware seals the deal. It’s less a toolkit and more a gravity well for your entire GenAI pipeline. Once you’re in, good luck escaping.
# Generative AI
# AI Frameworks
# NVIDIA
Building agentic tools for production requires far more than a simple chatbot interface. The real value comes from agents that can reliably take action at scale, integrate with core systems, and execute tasks through secure, controlled workflows. Yet most agentic tools never make it to production. Teams run into issues like strict security requirements, infrastructure complexity, latency constraints, high operational costs, and inconsistent behavior. To understand what it takes to ship production-grade agents, let's break down the key requirements one by one.
# Agents in Production
# Prosus Group
# Agentic Tools
Comment
Spencer Reagan
Demetrios Brinkmann
Spencer Reagan & Demetrios Brinkmann · Dec 5th, 2025
Spencer Reagan thinks it might be, and he’s not shy about saying so. In this episode, he and Demetrios Brinkmann get real about the messy, over-engineered state of agent systems, why LLMs still struggle in the wild, and how enterprises keep tripping over their own data chaos. They unpack red-teaming, security headaches, and the uncomfortable truth that most “AI platforms” still don’t scale. If you want a sharp, no-fluff take on where agents are actually headed, this one’s worth a listen.
# AI Governance
# AI Agents
# AI infrastructure
In this session, we’ll explore how developing and deploying AI-driven agents demands a fundamentally new testing paradigm—and how scalable simulations deliver the reliability, safety and human-feel that production-grade agents require. You’ll learn how simulations allow you to: - Mirror messy real-world user behavior (multiple languages, emotional states, background noise) rather than scripting narrow “happy-path” dialogues. - Model full conversation stacks including voice: turn-taking, background noise, accents, and latency – not just text messages. - Embed automated simulation suites into your CI/CD pipeline so that every change to your agent is validated before going live. - Assess multiple dimensions of agent performance—goal completion, brand-compliance, empathy, edge-case handling—and continuously guard against regressions. - Scale from “works in demo” to “works for every customer scenario” and maintain quality as your agent grows in tasks, languages or domains. Whether you’re building chat, voice, or multi-modal agents, you’ll walk away with actionable strategies for incorporating simulations into your workflow—improving reliability, reducing surprises in production, and enabling your agent to behave as thoughtfully and consistently as a human teammate.
# Agents in Production
# Prosus Group
# Agent Simulations
Comment
Overcome the friction of boilerplate code and infrastructure wrangling by adopting a declarative approach to AI agent development. This article introduces Ackgent, a production-ready template built on Google’s Agent Developer Kit (ADK) and Agent Config, which allows developers to define agent behaviors via structured YAML files while keeping implementation logic in Python. Learn how to leverage a modern stack—including uv, just, and Multi-agent Communication Protocol (MCP)—to rapidly prototype, test, and deploy scalable multi-agent systems on Google Cloud Run.
# AI Agents
# Generative AI Agents
# Artificial Intelligence
# Google ADK
# Data Science
Jon Saad-Falcon
Jimin (Anna) Yoon
Arthur Coleman
Jon Saad-Falcon, Jimin (Anna) Yoon & Arthur Coleman · Dec 1st, 2025
Language models are getting better at reasoning but their ability to verify their own outputs still lags behind. This paper tackles that challenge head-on by introducing Weaver, a framework that combines multiple weak verifiers into a single, stronger verifier without relying heavily on labeled data. ​Weaver uses weak supervision to estimate verifier reliability, normalize inconsistent outputs, and filter low-quality signals, resulting in a unified score that better reflects true response quality. In practice, this approach significantly boosts reasoning and math task performance rivaling models several times larger, such as achieving o3-mini-level accuracy using only Llama 3.3 70B as the generator.
# LLM Verification
# Weak Verifiers
# RAG Systems
Developing AI agents for shopping is just the first step; the real challenge is reliably running them in production across complex, mission-critical e-commerce systems—a significant MLOps hurdle. In this talk, we'll talk about Alfred, our agentic orchestration layer. Built with tools like Langgraph, LangFuse, LiteLLM, and Google Cloud components, Alfred is the critical piece that coordinates LLMs with our entire e-commerce backend—from search and recommendations to cart management. It handles the complete execution graph, secured tool calling, and prompt workflow. We’ll share our journey in designing a reusable agent architecture that scales across all our digital properties. We’ll discuss the specifics of our tech stack and productionization methodology, including how we leveraged the MCP framework and our existing platform APIs to accelerate development of Alfred.
# Agents in Production
# Prosus Group
# Agentic Commerce
Comment
Code of Conduct