Collections

All Content

All

Spencer Reagan & Demetrios Brinkmann · Dec 5th, 2025

Spencer Reagan thinks it might be, and he’s not shy about saying so. In this episode, he and Demetrios Brinkmann get real about the messy, over-engineered state of agent systems, why LLMs still struggle in the wild, and how enterprises keep tripping over their own data chaos. They unpack red-teaming, security headaches, and the uncomfortable truth that most “AI platforms” still don’t scale. If you want a sharp, no-fluff take on where agents are actually headed, this one’s worth a listen.

# AI Governance

# AI Agents

# AI infrastructure

Sachi Shah · Dec 3rd, 2025

Simulate to Scale: How realistic simulations power reliable agents in production // Sachi Shah

In this session, we’ll explore how developing and deploying AI-driven agents demands a fundamentally new testing paradigm—and how scalable simulations deliver the reliability, safety and human-feel that production-grade agents require. You’ll learn how simulations allow you to: - Mirror messy real-world user behavior (multiple languages, emotional states, background noise) rather than scripting narrow “happy-path” dialogues. - Model full conversation stacks including voice: turn-taking, background noise, accents, and latency – not just text messages. - Embed automated simulation suites into your CI/CD pipeline so that every change to your agent is validated before going live. - Assess multiple dimensions of agent performance—goal completion, brand-compliance, empathy, edge-case handling—and continuously guard against regressions. - Scale from “works in demo” to “works for every customer scenario” and maintain quality as your agent grows in tasks, languages or domains. Whether you’re building chat, voice, or multi-modal agents, you’ll walk away with actionable strategies for incorporating simulations into your workflow—improving reliability, reducing surprises in production, and enabling your agent to behave as thoughtfully and consistently as a human teammate.

# Agents in Production

# Prosus Group

# Agent Simulations

Comment

Médéric Hurier · Dec 2nd, 2025

Ackgent: Rapid Agent Development on GCP with ADK and Agent Config

Overcome the friction of boilerplate code and infrastructure wrangling by adopting a declarative approach to AI agent development. This article introduces Ackgent, a production-ready template built on Google’s Agent Developer Kit (ADK) and Agent Config, which allows developers to define agent behaviors via structured YAML files while keeping implementation logic in Python. Learn how to leverage a modern stack—including uv, just, and Multi-agent Communication Protocol (MCP)—to rapidly prototype, test, and deploy scalable multi-agent systems on Google Cloud Run.

# AI Agents

# Generative AI Agents

# Artificial Intelligence

# Google ADK

# Data Science

Jon Saad-Falcon, Jimin (Anna) Yoon & Arthur Coleman · Dec 1st, 2025

Shrinking the Generation-Verification Gap with Weak Verifiers

Language models are getting better at reasoning but their ability to verify their own outputs still lags behind. This paper tackles that challenge head-on by introducing Weaver, a framework that combines multiple weak verifiers into a single, stronger verifier without relying heavily on labeled data. Weaver uses weak supervision to estimate verifier reliability, normalize inconsistent outputs, and filter low-quality signals, resulting in a unified score that better reflects true response quality. In practice, this approach significantly boosts reasoning and math task performance rivaling models several times larger, such as achieving o3-mini-level accuracy using only Llama 3.3 70B as the generator.

# LLM Verification

# Weak Verifiers

# RAG Systems

Mefta Sadat · Nov 27th, 2025

Building Alfred, the Orchestration Layer for Agentic Commerce at Loblaws // Mefta Sadat

Developing AI agents for shopping is just the first step; the real challenge is reliably running them in production across complex, mission-critical e-commerce systems—a significant MLOps hurdle. In this talk, we'll talk about Alfred, our agentic orchestration layer. Built with tools like Langgraph, LangFuse, LiteLLM, and Google Cloud components, Alfred is the critical piece that coordinates LLMs with our entire e-commerce backend—from search and recommendations to cart management. It handles the complete execution graph, secured tool calling, and prompt workflow. We’ll share our journey in designing a reusable agent architecture that scales across all our digital properties. We’ll discuss the specifics of our tech stack and productionization methodology, including how we leveraged the MCP framework and our existing platform APIs to accelerate development of Alfred.

# Agents in Production

# Prosus Group

# Agentic Commerce

Comment

Phil Stafford · Nov 27th, 2025

When AI Agents Argue: Structured Dissent Patterns for Production Reliability // Phil Stafford

Single-agent LLM systems fail silently in production - they're confidently wrong at scale with no mechanism for self-correction. We've deployed a multi-agent orchestration pattern called ""structured dissent"" where believer, skeptic, and neutral agents debate decisions before consensus. This isn't theoretical - we'll show production deployment patterns, cost/performance tradeoffs, and measurable reliability improvements. You'll learn when multi-agent architectures justify the overhead, how to orchestrate adversarial agents effectively, and operational patterns for monitoring agent reasoning quality in production. Our first deployment of the debate swarm revolves around MCP servers - we use a security swarm specially built for MCP servers to analyze findings from open source security tools. This provides more nuanced reasoning and gives a confidence score to evaluate the security of unknown MCP tools.

# Agents in Production

# Prosus Group

# Production Reliability

Comment

Hamed Taheri · Nov 27th, 2025

Multi-Agent Personalization with Shared Memory: From Email to Website to Proposal // Hamed Taheri

Personalization at scale needs deep understanding of each customer. You must collect data from many sources, read it, reason and infer, plan, decide, act, and write to each person. One agent doing everything gave us poor and inconsistent quality. Multi-agent systems changed that. They deliver mass personalization. They also break in edge cases, contradict each other, and are hard to debug. I will share how we addressed this with Cortex UCM, a unified customer memory, and Generative Tables. We map noisy data into a clean, structured layer that agents read and write. We began with email for both outbound and inbound communication. Then we personalized websites and product pages for e-commerce at scale. I share customer stories. For example, one customer had over 60,000 product pages that required customization for thousands of communities and product offerings. I will present our decentralized shared-memory orchestration briefly and how it stays transparent and debuggable. It opens safe paths for external agents. What failed. What worked. What we are building next.

# Agents in Production

# Prosus Group

# Multi-Agent Personalities

Comment

Matt Sharp · Nov 27th, 2025

Time to become a hacker // Matt Sharp

There's never been a better time to be a hacker. With the explosion of vibe-coded solutions full of vulnerabilities and the power and ease that LLMs and Agents lend to hackers, we are seeing an increase in attacks. This talk dives into several vulnerabilities that agent systems have introduced and how they are already being exploited.

# Agents in Production

# Prosus Group

# Hacker

Comment

Dirk Petzoldt · Nov 27th, 2025

Dynamic Contextual Retrieval in Enterprise Analytics // Dirk Petzoldt

We reflect on how the complexity of an agent analytics project at an international pharma taught us to move from prompt engineering to context engineering, empowering agent with interactive tooling to build their context dynamically.

# Agents in Production

# Prosus Group

# Enterprise Analytics

Comment

Santoshkalyan Rayadhurgam · Nov 27th, 2025

Agents as Search Engineers // Santoshkalyan Rayadhurgam

Search is still the front door of most digital products—and it’s brittle. Keyword heuristics and static ranking pipelines struggle with messy, ambiguous queries. Traditionally, fixing this meant years of hand-engineering and expensive labeling. Large language models change that equation: they let us deploy agents that act like search engineers—rewriting queries, disambiguating intent, and even judging relevance on the fly. In this talk, I’ll show how to put these agents to work in real production systems. We’ll look at simple but powerful patterns—query rewriting, hybrid retrieval, agent-based reranking—and what actually happens when you deploy them at scale. You’ll hear about the wins, the pitfalls, and the open questions. The goal: to leave you with a practical playbook for how agents can make search smarter, faster, and more adaptive—without turning your system into a black box.

# Agents in Production

# Prosus Group

# Search Engine Agents

Comment

Collections

All Collections

All Content