MLOps Community
+00:00 GMT
AI in Production 2025
LIVESTREAM

AI in Production 2025

# AI in Production
# AI

We’re Back for Round Two!

The AI in Production 2025 event builds on the momentum of last year, with a better focus on the toughest challenges of deploying AI at scale.

LLMs and AI applications are advancing rapidly, but production hurdles haven’t gone away. This year, we’re tackling the hard stuff—managing costs, meeting latency requirements, debugging complex systems, building trust in outputs and agents of course.

Hear straight from the people making it happen and see how they’re solving problems you might be facing today.

some-file-c9a2b0e8-e11c-40ac-aaca-c6306a5e1c49



Speakers

Shreya Rajpal
Shreya Rajpal
Creator @ Guardrails AI
Afshaan Mazagonwalla
Afshaan Mazagonwalla
AI Engineer @ Google Cloud Consulting
Vasu Sharma
Vasu Sharma
Applied Research Scientist @ Meta (FAIR)
Amy Hodler
Amy Hodler
Executive Director @ GraphGeeks
Jessica Talisman
Jessica Talisman
Senior Information Architect @ Adobe
Raza Habib
Raza Habib
CEO and Co-founder @ Humanloop
Erik Bernhardsson
Erik Bernhardsson
Founder @ Modal Labs
Hala Nelson
Hala Nelson
Author and Professor of Mathematics @ James Madison University
Merrell Stone
Merrell Stone
Research, Strategic Foresight and Human Systems @ Avanade
Bassey Etim
Bassey Etim
Senior Director - Content Strategy @ Pluralsight
Erica Greene
Erica Greene
Director of Engineering, Machine Learning @ Yahoo
Guanhua Wang
Guanhua Wang
Senior Researcher @ Microsoft
Sishi Long
Sishi Long
Staff Software Engineer @ Uber
Michael Gschwind
Michael Gschwind
Distinguished Engineer @ NVIDIA
Chiara Caratelli
Chiara Caratelli
Data Scientist @ Prosus Group
Paco Nathan
Paco Nathan
Principal Developer Relations Engineer @ Senzing
Egor Kraev
Egor Kraev
Head of AI @ Wise
Monika Podsiadlo
Monika Podsiadlo
Voice AI @ Google DeepMind
Gaurav Mittal
Gaurav Mittal
Staff Software Engineer @ Stripe
Aditya Gautam
Aditya Gautam
Senior Machine Learning Engineer @ Meta
Weidong Yang
Weidong Yang
CEO @ Kineviz
Alessandro Negro
Alessandro Negro
Chief Scientist @ GraphAware
Samuel Partee
Samuel Partee
CTO & Co-Founder @ Arcade AI
David Hughes
David Hughes
Principal Solution Architect - Engineering & AI @ Enterprise Knowledge
Brianna Connelly
Brianna Connelly
VP of Data Science @ Filevine
Sebastian Kukla
Sebastian Kukla
Digital Transformation-North America @ RHI Magnesita
Sahil Khanna
Sahil Khanna
Senior machine learning engineer @ Adobe
Dimitrios Athanasakis
Dimitrios Athanasakis
Principal AI Engineer @ AstraZeneca
Tanmay Chopra
Tanmay Chopra
Founder / CEO @ Emissary
Aisha Yusaf
Aisha Yusaf
Founder @ Orra
Rex Harris
Rex Harris
Founder, AI Product Lead @ Agents of Change
Julia Gomes
Julia Gomes
Senior Product Manager, Gen AI @ Arize AI
Ron Chrisley
Ron Chrisley
Professor of Cognitive Science and AI @ University of Sussex
Ezo Saleh
Ezo Saleh
Founder @ Orra
Vaibhav Gupta
Vaibhav Gupta
CEO @ Boundary ML
Tom Shapland
Tom Shapland
CEO @ Canonical AI
Ilya Reznik
Ilya Reznik
Head of Edge ML @ Live View Technologies
Daniel Svonava
Daniel Svonava
CEO & Co-founder @ Superlinked
Rahul Parundekar
Rahul Parundekar
Principal MLOps Engineer/Architect @ Tribe AI
Mohan Atreya
Mohan Atreya
Chief Product Officer @ Rafay
Joshua Alphonse
Joshua Alphonse
Head of Product @ PremAI
Demetrios Brinkmann
Demetrios Brinkmann
Chief Happiness Engineer @ MLOps Community

Agenda

Stage 1
Stage 2
Stage 3
3:00 PM, GMT
-
3:20 PM, GMT
Opening / Closing
Welcome to AI in Production 2025
Demetrios Brinkmann
3:25 PM, GMT
-
3:50 PM, GMT
Keynote
The LLM Guardrails Index: Benchmarking Responsible AI Deployment

As LLMs become more common, ensuring they're reliable, secure, and compliant is more important than ever. With many guardrail solutions popping up, how do AI engineers figure out which solutions work best for them?

In this talk, Shreya will cover the first-ever AI Guardrails Index — a thorough evaluation of top guardrail solutions. Based on systematic assessment of 6 key AI risk categories, this benchmark helps AI developers, platform teams, and decision-makers better understand the landscape of LLM safety.

Key takeaways:

  • The Current State of AI Guardrails and why they matter for responsible AI use
  • See how different guardrail solutions stack up in terms of precision, recall, accuracy, and speed.
  • Find out which guardrails work best for specific needs, from finance to healthcare to content filtering
+ Read More
Shreya Rajpal
3:55 PM, GMT
-
4:20 PM, GMT
Presentation
Graph Retrieval - Let Me Count The Ways

Whenever you read about taking Retrieval Augmented Generation beyond simple vector search on embeddings, graphs are almost sure to come up. But what graphs? Old-school knowledge graphs, with entities and their relationships, or document-centric graphs, with text snippets as nodes? And how do you use them to improve your retrieval? Nearest neighborhood? PageRank? Something else?

I will provide an overview of what's happening in that space, including what I'm doing, and give you a tour of the different options, with their pros and cons.

+ Read More
Egor Kraev
4:25 PM, GMT
-
4:50 PM, GMT
Presentation
Bridging the gap between Model Development and AI Infrastructure

Large Language Models (LLMs) are transforming industries, but their success hinges not just on cutting-edge models, but on the ability to efficiently train, deploy, and manage them at scale. Training state-of-the-art models like GPT/LLAMA requires thousands of GPUs running for weeks, posing significant challenges for MLOps teams.

This session explores how LLMs are built, the scale of training required, and the growing infrastructure demands placed on MLOps teams. It will cover the key operational challenges of AI workloads, including distributed training, cost optimization, and multi-tenancy. Finally, the talk will highlight how Rafay’s GPU PaaS enables seamless AI infrastructure management, helping organizations scale AI efficiently without bottlenecks.

Attendees will gain a deeper understanding of LLM training, GPU workload management, and practical strategies to optimize AI infrastructure.

+ Read More
Mohan Atreya
4:55 PM, GMT
-
5:05 PM, GMT
Break
Break
5:05 PM, GMT
-
5:30 PM, GMT
Presentation
Eval Driven Development: Best Practices and Pitfalls When Building with AI

Learn how the best companies use evals to guide AI development and build a virtuous cycle of product improvement!

We’ll cover the fundamentals of how to use evaluation driven development to build reliable applications with Large Language Models (LLMs).

Building an AI application, RAG system, or agent involves many design choices. You have to choose between models, design prompts, expose tools to the model and build knowledge bases for RAG.

If you don’t have a good evaluation in place then it’s likely you’ll waste a lot of time making changes but not actually improving performance. Post-deployment, evaluation is essential to ensure that changes don’t introduce regressions to your product.

Using real world case-studies from AI teams at companies like Gusto, Filevine and Fundrise, who are building production-grade Agents and LLM applications, we’ll cover how to design your evaluators and use them as part of an iterative development process.

At the end of the session you should understand the pitfalls and best practices of how to construct evaluators in practice, as well as the process of evaluation driven AI development.

+ Read More
Raza Habib
Brianna Connelly
5:35 PM, GMT
-
6:00 PM, GMT
Presentation
Building Consumer Facing GenAI Chatbots: Lessons in AI Design, Scaling, and Brand Safety

Building a GenAI chatbot for millions of users? This session reveals the secret sauce: best practices in LLM orchestration, agentic workflows, and grounded responses, all while prioritizing brand safety. Learn key architectural decisions for balancing latency and quality, and discover strategies for scaling to production.

+ Read More
Afshaan Mazagonwalla
6:05 PM, GMT
-
6:35 PM, GMT
Panel Discussion
Building Platforms for Gen AI Workloads vs Traditional ML Workloads

As enterprises scale AI adoption, the infrastructure demands of GenAI workloads are pushing the limits of traditional ML platforms. This panel will explore the key differences in architecture, resource management, and operational challenges when building platforms for GenAI versus traditional ML. How do data, compute, and model lifecycle management change? What trade-offs exist between flexibility and control?

+ Read More
Ilya Reznik
Gaurav Mittal
Daniel Svonava
Julia Gomes
Rahul Parundekar
6:35 PM, GMT
-
6:45 PM, GMT
Break
Break
6:45 PM, GMT
-
7:10 PM, GMT
Presentation
Unlocking AI Agents: Fixing Authorization to Get Real Work Done

This talk is about making AI agents truly useful by fixing how we handle authorization. Right now, we depend on API keys and static tokens stored in environments variables that tie actions to single users, which isn't flexible or secure for bigger operations. I'll cover why this holds us back from letting AI do important tasks, like sending emails or managing sensitive data, autonomously. We'll explore simple ways to update these systems, so AI can work for us without constant human intervention. This is all about moving beyond flashy demos to real-world, impactful AI applications.

+ Read More
Samuel Partee
7:15 PM, GMT
-
7:40 PM, GMT
Presentation
Introducing the Prompt Engineering Toolkit

A well-crafted prompt is essential for obtaining accurate and relevant outputs from LLMs (Large Language Models). Prompt design enables users new to machine learning to control model output with minimal overhead.

To facilitate rapid iteration and experimentation of LLMs at Uber, there was a need for centralization to seamlessly construct prompt templates, manage them, and execute them against various underlying LLMs to take advantage of LLM support tasks.

To meet these needs, we built a prompt engineering toolkit that offers standard strategies that encourage prompt engineers to develop well-crafted prompt templates.

The centralized prompt engineering toolkit enables the creation of effective prompts with system instructions, dynamic contextualization, massive batch offline generation (LLM inference), and evaluation of prompt responses. Furthermore, there’s a need for version control, collaboration, and robust safety measures (hallucination checks, standardized evaluation framework, and a safety policy) to ensure responsible AI usage.

+ Read More
Sishi Long
7:45 PM, GMT
-
7:55 PM, GMT
Break
Break
7:55 PM, GMT
-
8:20 PM, GMT
Presentation
LLM in Large-Scale Recommendation Systems: Use Cases and Challenges

The advent of Large Language Models (LLMs) has significantly transformed the landscape of recommendation systems, marking a shift from traditional discriminative approaches to more generative paradigms. This transition has not only enhanced the performance of recommendation systems but also introduced a new set of challenges that need to be addressed. LLMs has several practical use-cases in modern recommendation systems, including retrieval, ranking, embedding generation for users and items in diverse spaces, harmful content detection, user history representation, and interest exploration and exploitation.

However, integrating LLMs into recommendation systems is not without its hurdles. On the algorithmic front, issues such as bias, integrity, explainability, freshness, cold start, and the integration with discriminative models pose significant challenges. Additionally, there are numerous production deployment and development challenges, including training, inference, cost management, optimal resource utilization, latency, and monitoring. Beyond these, there are unforeseen issues that often remain hidden during A/B testing but become apparent once the model is deployed in a production environment. These include impact dilution, discrepancies between pre-test and backtest results, and model dependency, all of which can affect the overall effectiveness and reliability of the recommendation system. Addressing these challenges is crucial for harnessing the full potential of LLMs in recommendation systems.

+ Read More
Aditya Gautam
8:25 PM, GMT
-
8:50 PM, GMT
Presentation
Doxing the Dark Web

The Dark Web: an estimated $3T USD flows annually through shell corps and tax havens worldwide -- serving as the perpetua mobilia for oligarchs, funding illegal weapons transfers, mercenaries, human trafficking at scale, anti-democracy campaigns, cyber attacks at global scale, even illegal fishing fleets. Tendrils of kleptocracy extend through the heart of London, reaching into many of the VC firms in Silicon Valley, and now into the White House.

The people who "catch bad guys" – investigative journalists, regulators, gov agencies – leverage AI apps to contend with the overwhelming data volumes. Few of those who do "bad guy hunting" get to speak at tech conferences. However, our team provides core technology for this work, and we can use open source, open models, and open data to illustrate. How technology gets used to stick the moves of the world's worst organized crime, how to fight against the oligarchs who use complex networks to hide their grift.

This talk explores known cases, tradecraft employed, and open data sources for fighting against kleptocracy. Moreover, we'll look at where AI and Data professionals are very much needed, where you can get involved.

+ Read More
Paco Nathan
8:55 PM, GMT
-
9:20 PM, GMT
Closing Keynote
The AI Developer Experience Sucks so Let's Fix it – The Story of Modal

Erik will talk about how he went into a deep infrastructure rabbit hole to fix the developer experience working with the cloud and GPUs. This involves building a custom file system, their own scheduler, and much more.

+ Read More
Erik Bernhardsson

Sponsors

Gold
Silver
Event has finished
March 12, 3:00 PM, GMT
Online
Organized by
MLOps Community
MLOps Community
Event has finished
March 12, 3:00 PM, GMT
Online
Organized by
MLOps Community
MLOps Community