MLOps Community
Serving LLMs in Production: Performance, Cost & Scale
LIVESTREAM

Serving LLMs in Production: Performance, Cost & Scale

# Cast AI
# LLMs in Production

Serving LLMs in Production with Cast AI

Performance, Cost, and Scale.

Experimenting with LLMs is easy. Running them reliably and cost-effectively in production is where things break.

Most AI teams never make it past demos and proofs of concept. A smaller group is pushing real workloads to production—and running into very real challenges around infrastructure efficiency, runaway cloud costs, and reliability at scale.

This session is for engineers and platform teams moving beyond experimentation and building AI systems that actually hold up in production.

---------------------------------------------------------------------------

The Reality of Production AI

We’ll skip the hype and focus on the challenges that show up once traffic, latency, and cost matter:

  1. The Cost Problem Why LLM workloads drive cloud bills through the roof—and how smarter infrastructure and automation can rein them in.
  2. The Infrastructure Shift What it takes to move from managed cloud APIs to production-ready, Kubernetes-based environments optimized for AI workloads.
  3. Reliability at Scale How to maintain performance and stability as models, agents, and traffic grow more complex.

--------------------------------------------------------------------------------------------------------------------------------

What You’ll Learn

Using concrete, real-world examples—including agentic and multi-model workloads—we’ll walk through patterns that teams are successfully running in production today.

You’ll learn:

  1. Battle-Tested Architectures Infrastructure patterns that survive real traffic and real cost pressure.
  2. Agentic Workloads in Practice How to manage the unique latency, scaling, and cost demands of autonomous agents.
  3. Scaling Without Breaking Practical steps to go from prototype to production while keeping performance predictable and spend under control.

If you’re ready to move past “it works on my machine” and build AI systems that are efficient, reliable, and production-ready, this session is for you.

[Register Now]

Speakers

Igor Šušić
Staff Machine Learning Engineer @ Cast AI
Ioana Apetrei
Senior Product Manager @ Cast AI
Demetrios Brinkmann
Chief Happiness Engineer @ MLOps Community

Agenda

From5:00 PM
To6:00 PM
GMT
Tags:
Presentation
Round Table Discussion - Serving LLMs in Production: Performance, Cost & Scale

Most AI initiatives are stuck in experimentation, with only a small fraction successfully reaching production at scale. In this session, we’ll walk through the real technical and product challenges teams face when scaling AI workloads, why costs explode, and what patterns actually work in practice. Using concrete examples (including agentic use cases), we’ll share how teams can move from cloud APIs to efficient, production-grade infrastructure without losing reliability.

+ Read More
Speakers:
user's Avatar
user's Avatar
user's Avatar

Sponsors

Live in 3 days
February 05, 5:00 PM GMT
Online
Organized by
user's Avatar
MLOps Community
Live in 3 days
February 05, 5:00 PM GMT
Online
Organized by
user's Avatar
MLOps Community
Code of Conduct