MLOps Community

Collections

All Collections

Agents in Production - Prosus x MLOps
31 items

All Content

All Tags
All Types
Ioana  Apetrei
Igor  Šušić
Adam Becker
Ioana Apetrei, Igor Šušić & Adam Becker · Feb 19th, 2026
Serving LLMs in Production: Performance, Cost & Scale // CAST AI Roundtable
Experimenting with LLMs is easy. Running them reliably and cost-effectively in production is where things break. Most AI teams never make it past demos and proofs of concept. A smaller group is pushing real workloads to production—and running into very real challenges around infrastructure efficiency, runaway cloud costs, and reliability at scale. This session is for engineers and platform teams moving beyond experimentation and building AI systems that actually hold up in production.
# AI Applications
# GPU Orchestration
# Kubernetes Clusters
# CAST AI
Rahul   Raja
Demetrios Brinkmann
Rahul Raja & Demetrios Brinkmann · Feb 17th, 2026
Information Retrieval is evolving from keyword matching to intelligent, vector-based understanding. In this talk, Rahul Raja explores how dense retrieval, vector databases, and hybrid search systems are redefining how modern AI retrieves, ranks, and reasons over information. He discusses how retrieval now powers large language models through Retrieval-Augmented Generation (RAG) and the new MLOps challenges that arise, embedding drift, continuous evaluation, and large-scale vector maintenance. Looking ahead, the session envisions a future of Cognitive Search, where retrieval systems move beyond recall to genuine reasoning, contextual understanding, and multimodal awareness. Listeners will gain insight into how the next generation of retrieval will bridge semantics, scalability, and intelligence, powering everything from search and recommendations to generative AI.
# AI Agents
# AI Engineer
# AI agents in production
# AI Agents use case
# System Design
This post details the practical application of the A2UI protocol, introducing the Agent-View-Controller (AVC) pattern to decouple agent logic from UI rendering. It highlights that while A2UI enables secure, adaptable interfaces, a hybrid architecture combining static and dynamic elements is often required to balance expressiveness with latency.
# Artificial Intelligence
# Generative Ui
# Software Architecture
# LLMs
# Fronted Development
Vincent D. Warmerdam
Demetrios Brinkmann
Vincent D. Warmerdam & Demetrios Brinkmann · Feb 13th, 2026
Vincent Warmerdam joins Demetrios fresh off marimo’s acquisition by Weights & Biases—and makes a bold claim: notebooks as we know them are outdated. They talk Molab (GPU-backed, cloud-hosted notebooks), LLMs that don’t just chat but actually fix your SQL and debug your code, and why most data folks are consuming tools instead of experimenting. Vincent argues we should stop treating notebooks like static scratchpads and start treating them like dynamic apps powered by AI. It’s a conversation about rethinking workflows, reclaiming creativity, and not outsourcing your brain to the model.
# Vincent D. Warmerdam
# Calmcode
# marimo
# wandb
# Jupiter Notebooks
# Data Science
A conversation on how AI coding agents are changing the way we build and operate production systems. We explore the practical boundaries between agentic and deterministic code, strategies for shared responsibility across models, engineering teams, and customers, and how to evaluate agent performance at scale. Topics include production quality gates, safety and cost tradeoffs, managing long-tail failures, and deployment patterns that let you ship agents with confidence.
# AI Agents
# AI Engineer
# AI agents in production
# AI Agents use case
# System Design
Addressing the challenge of AI agent exposition, this post evaluates various implementation paths, including full-stack frameworks and AI-generated code. It identifies A2UI as a promising declarative solution that enables dynamic, secure interfaces by decoupling the agent's logic from the client's rendering capabilities.
# Artificial Intelligence
# UI
# Generative AI Tools
# AI Agent
# Software Development
Nick Gillian
Demetrios Brinkmann
Nick Gillian & Demetrios Brinkmann · Feb 6th, 2026
As AI moves beyond the cloud and simulation, the next frontier is Physical AI: systems that can perceive, understand, and act within real-world environments in real time. In this conversation, Nick Gillian, Co-Founder and CTO of Archetype AI, explores what it actually takes to turn raw sensor and video data into reliable, deployable intelligence. Drawing on his experience building Google’s Soli and Jacquard and now leading development of Newton, a foundational model for Physical AI, Nick discusses how real-time physical understanding changes what’s possible across safety monitoring, infrastructure, and human–machine interaction. He’ll share lessons learned translating advanced research into products that operate safely in dynamic environments, and why many organizations underestimate the challenges and opportunities of AI in the physical world.
# AI Agents
# AI Engineer
# AI agents in production
# AI Agents use case
# System Design
This blog explains a systematic way to fix CUDA out-of-memory (OOM) errors during GRPO reinforcement learning training, instead of randomly lowering hyperparameters until something works. Subham argues that most GPU memory issues come from three sources: vLLM reserving GPU memory upfront (often the biggest chunk), training activations (which scale with batch size, sequence length, number of generations, and model size), and model memory (usually the smallest contributor). By carefully reading the OOM error message and estimating how memory is distributed across these components, you can identify exactly what’s causing the crash. The recommended approach is to calculate memory usage first, then adjust the highest-impact settings, such as GPU memory allocation for vLLM, number of generations, batch size, and sequence length. The guide also shows how to maintain training quality by using techniques like gradient accumulation instead of simply shrinking everything. Overall, the key message is: treat OOM debugging as a measurable engineering problem, not trial-and-error, so you can fix memory issues faster while preserving training performance.
# GRPO
# CUDA
# GPU Memory
# LLM Training
Hundreds of neocloud operators and "AI Factory" builders have emerged to serve the insatiable demand for AI infrastructure. These teams are compressing the design, build, deploy, operate, scale cycle of their infrastructures down to months, while managing massive footprints with lean teams. How? By applying modern intent driven infrastructure automation principles to greenfield deployments. We'll explore how these teams carry design intent through to production, and how operating and automating around consistent infrastructure data is compressing "time to first train".
# AI Agents
# AI Engineer
# AI agents in production
# AI Agents use case
# System Design
Mike Oaten
Demetrios Brinkmann
Mike Oaten & Demetrios Brinkmann · Jan 27th, 2026
As AI models move into high-stakes environments like Defence and Financial Services, standard input/output testing, evals, and monitoring are becoming dangerously insufficient. To achieve true compliance, MLOps teams need to access and analyse the internal reasoning of their models to achieve compliance with the EU AI Act, NIST AI RMF, and other requirements. In this session, Mike introduces the company's patent-pending AI assurance technology that moves beyond statistical proxies. He will break down the architecture of the Synapses Logger, a patent-pending technology that embeds directly into the neural activation flow to capture weights, activations, and activation paths in real-time.
# EU AI Act
# Regulations Compliance
# Tikos
Code of Conduct