MLOps Community

Collections

All Collections

Agents in Production - Prosus x MLOps
31 items

All Content

All Tags
All Types
Chris Fregly
Demetrios Brinkmann
Chris Fregly & Demetrios Brinkmann · Feb 24th, 2026
Performance Optimization and Software/Hardware Co-design across PyTorch, CUDA, and NVIDIA GPUs
In today’s era of massive generative models, it's important to understand the full scope of AI systems' performance engineering. This talk discusses the new O'Reilly book, AI Systems Performance Engineering, and the accompanying GitHub repo (https://github.com/cfregly/ai-performance-engineering). This talk provides engineers, researchers, and developers with a set of actionable optimization strategies. You'll learn techniques to co-design and co-optimize hardware, software, and algorithms to build resilient, scalable, and cost-effective AI systems for both training and inference.
# NVIDIA GPUs
# CUDA framework
# GitHub repo
Axel Mendoza
Axel Mendoza · Feb 24th, 2026
A hands-on beginner roadmap for learning Kubernetes, designed to walk you through core concepts (like clusters, pods, services, deployments, storage, RBAC, autoscaling, etc.) with simple explanations, CLI examples, and practical exercises. By following it you build real experience and are prepared to use Kubernetes locally or on cloud platforms like GKE or EKS.
# DevOps
# Kubernetes
# From Scratch
Ioana  Apetrei
Igor  Šušić
Adam Becker
Ioana Apetrei, Igor Šušić & Adam Becker · Feb 19th, 2026
Experimenting with LLMs is easy. Running them reliably and cost-effectively in production is where things break. Most AI teams never make it past demos and proofs of concept. A smaller group is pushing real workloads to production—and running into very real challenges around infrastructure efficiency, runaway cloud costs, and reliability at scale. This session is for engineers and platform teams moving beyond experimentation and building AI systems that actually hold up in production.
# AI Applications
# GPU Orchestration
# Kubernetes Clusters
# CAST AI
Rahul   Raja
Demetrios Brinkmann
Rahul Raja & Demetrios Brinkmann · Feb 17th, 2026
Information Retrieval is evolving from keyword matching to intelligent, vector-based understanding. In this talk, Rahul Raja explores how dense retrieval, vector databases, and hybrid search systems are redefining how modern AI retrieves, ranks, and reasons over information. He discusses how retrieval now powers large language models through Retrieval-Augmented Generation (RAG) and the new MLOps challenges that arise, embedding drift, continuous evaluation, and large-scale vector maintenance. Looking ahead, the session envisions a future of Cognitive Search, where retrieval systems move beyond recall to genuine reasoning, contextual understanding, and multimodal awareness. Listeners will gain insight into how the next generation of retrieval will bridge semantics, scalability, and intelligence, powering everything from search and recommendations to generative AI.
# AI Agents
# AI Engineer
# AI agents in production
# AI Agents use case
# System Design
This post details the practical application of the A2UI protocol, introducing the Agent-View-Controller (AVC) pattern to decouple agent logic from UI rendering. It highlights that while A2UI enables secure, adaptable interfaces, a hybrid architecture combining static and dynamic elements is often required to balance expressiveness with latency.
# Artificial Intelligence
# Generative Ui
# Software Architecture
# LLMs
# Fronted Development
Vincent D. Warmerdam
Demetrios Brinkmann
Vincent D. Warmerdam & Demetrios Brinkmann · Feb 13th, 2026
Vincent Warmerdam joins Demetrios fresh off marimo’s acquisition by Weights & Biases—and makes a bold claim: notebooks as we know them are outdated. They talk Molab (GPU-backed, cloud-hosted notebooks), LLMs that don’t just chat but actually fix your SQL and debug your code, and why most data folks are consuming tools instead of experimenting. Vincent argues we should stop treating notebooks like static scratchpads and start treating them like dynamic apps powered by AI. It’s a conversation about rethinking workflows, reclaiming creativity, and not outsourcing your brain to the model.
# Vincent D. Warmerdam
# Calmcode
# marimo
# wandb
# Jupiter Notebooks
# Data Science
A conversation on how AI coding agents are changing the way we build and operate production systems. We explore the practical boundaries between agentic and deterministic code, strategies for shared responsibility across models, engineering teams, and customers, and how to evaluate agent performance at scale. Topics include production quality gates, safety and cost tradeoffs, managing long-tail failures, and deployment patterns that let you ship agents with confidence.
# AI Agents
# AI Engineer
# AI agents in production
# AI Agents use case
# System Design
Addressing the challenge of AI agent exposition, this post evaluates various implementation paths, including full-stack frameworks and AI-generated code. It identifies A2UI as a promising declarative solution that enables dynamic, secure interfaces by decoupling the agent's logic from the client's rendering capabilities.
# Artificial Intelligence
# UI
# Generative AI Tools
# AI Agent
# Software Development
Nick Gillian
Demetrios Brinkmann
Nick Gillian & Demetrios Brinkmann · Feb 6th, 2026
As AI moves beyond the cloud and simulation, the next frontier is Physical AI: systems that can perceive, understand, and act within real-world environments in real time. In this conversation, Nick Gillian, Co-Founder and CTO of Archetype AI, explores what it actually takes to turn raw sensor and video data into reliable, deployable intelligence. Drawing on his experience building Google’s Soli and Jacquard and now leading development of Newton, a foundational model for Physical AI, Nick discusses how real-time physical understanding changes what’s possible across safety monitoring, infrastructure, and human–machine interaction. He’ll share lessons learned translating advanced research into products that operate safely in dynamic environments, and why many organizations underestimate the challenges and opportunities of AI in the physical world.
# AI Agents
# AI Engineer
# AI agents in production
# AI Agents use case
# System Design
This blog explains a systematic way to fix CUDA out-of-memory (OOM) errors during GRPO reinforcement learning training, instead of randomly lowering hyperparameters until something works. Subham argues that most GPU memory issues come from three sources: vLLM reserving GPU memory upfront (often the biggest chunk), training activations (which scale with batch size, sequence length, number of generations, and model size), and model memory (usually the smallest contributor). By carefully reading the OOM error message and estimating how memory is distributed across these components, you can identify exactly what’s causing the crash. The recommended approach is to calculate memory usage first, then adjust the highest-impact settings, such as GPU memory allocation for vLLM, number of generations, batch size, and sequence length. The guide also shows how to maintain training quality by using techniques like gradient accumulation instead of simply shrinking everything. Overall, the key message is: treat OOM debugging as a measurable engineering problem, not trial-and-error, so you can fix memory issues faster while preserving training performance.
# GRPO
# CUDA
# GPU Memory
# LLM Training
Code of Conduct