LIVESTREAM

AI in Production 2025

# AI in Production

# AI

We’re Back for Round Two!

The AI in Production 2025 event builds on the momentum of last year, with a better focus on the toughest challenges of deploying AI at scale.

LLMs and AI applications are advancing rapidly, but production hurdles haven’t gone away. This year, we’re tackling the hard stuff—managing costs, meeting latency requirements, debugging complex systems, building trust in outputs and agents of course.

Hear straight from the people making it happen and see how they’re solving problems you might be facing today.

some-file-c9a2b0e8-e11c-40ac-aaca-c6306a5e1c49

Speakers

Shreya Rajpal

Creator @ Guardrails AI

Afshaan Mazagonwalla

AI Engineer @ Google Cloud Consulting

Vasu Sharma

Applied Research Scientist @ Meta (FAIR)

Amy Hodler

Executive Director @ GraphGeeks

Jessica Talisman

Senior Information Architect @ Adobe

Raza Habib

CEO and Co-founder @ Humanloop

Erik Bernhardsson

Founder @ Modal Labs

Hala Nelson

Author and Professor of Mathematics @ James Madison University

Merrell Stone

Research, Strategic Foresight and Human Systems @ Avanade

Bassey Etim

Senior Director - Content Strategy @ Pluralsight

Erica Greene

Director of Engineering, Machine Learning @ Yahoo

Guanhua Wang

Senior Researcher @ Microsoft

Sishi Long

Staff Software Engineer @ Uber

Michael Gschwind

Distinguished Engineer @ NVIDIA

Chiara Caratelli

Data Scientist @ Prosus Group

Paco Nathan

Principal Developer Relations Engineer @ Senzing

Egor Kraev

Head of AI @ Wise

Monika Podsiadlo

Voice AI @ Google DeepMind

Gaurav Mittal

Staff Software Engineer @ Stripe

Aditya Gautam

Machine Learning Technical Lead @ Meta

Weidong Yang

CEO @ Kineviz

Alessandro Negro

Chief Scientist @ GraphAware

Samuel Partee

CTO & Co-Founder @ Arcade AI

David Hughes

Principal Solution Architect - Engineering & AI @ Enterprise Knowledge

Brianna Connelly

VP of Data Science @ Filevine

Sebastian Kukla

Digital Transformation-North America @ RHI Magnesita

Sahil Khanna

Senior machine learning engineer @ Adobe

Dimitrios Athanasakis

Principal AI Engineer @ AstraZeneca

Tanmay Chopra

Founder / CEO @ Emissary

Aisha Yusaf

Founder @ Orra

Rex Harris

Founder, AI Product Lead @ Agents of Change

Julia Gomes

Senior Product Manager @ Inworld AI

Ron Chrisley

Professor of Cognitive Science and AI @ University of Sussex

Ezo Saleh

Founder @ Orra

Vaibhav Gupta

CEO @ Boundary ML

Tom Shapland

CEO @ Canonical AI

Ilya Reznik

Head of Edge ML @ Live View Technologies

Daniel Svonava

CEO & Co-founder @ Superlinked

Rahul Parundekar

Founder @ A.I. Hero, Inc.

Mohan Atreya

Chief Product Officer @ Rafay

Joshua Alphonse

Head of Product @ PremAI

Demetrios Brinkmann

Chief Happiness Engineer @ MLOps Community

Qasim Wani

CTO @ Advex

Agenda

Stage 1

Stage 2

Stage 3

3:00 PM

3:20 PM

GMT

Opening / Closing

Welcome to AI in Production 2025

3:25 PM

3:50 PM

GMT

Keynote

The LLM Guardrails Index: Benchmarking Responsible AI Deployment

As LLMs become more common, ensuring they're reliable, secure, and compliant is more important than ever. With many guardrail solutions popping up, how do AI engineers figure out which solutions work best for them?

In this talk, Shreya will cover the first-ever AI Guardrails Index — a thorough evaluation of top guardrail solutions. Based on systematic assessment of 6 key AI risk categories, this benchmark helps AI developers, platform teams, and decision-makers better understand the landscape of LLM safety.

Key takeaways:

The Current State of AI Guardrails and why they matter for responsible AI use
See how different guardrail solutions stack up in terms of precision, recall, accuracy, and speed.
Find out which guardrails work best for specific needs, from finance to healthcare to content filtering

+ Read More

3:55 PM

4:20 PM

GMT

Presentation

Graph Retrieval - Let Me Count The Ways

Whenever you read about taking Retrieval Augmented Generation beyond simple vector search on embeddings, graphs are almost sure to come up. But what graphs? Old-school knowledge graphs, with entities and their relationships, or document-centric graphs, with text snippets as nodes? And how do you use them to improve your retrieval? Nearest neighborhood? PageRank? Something else?

I will provide an overview of what's happening in that space, including what I'm doing, and give you a tour of the different options, with their pros and cons.

+ Read More

4:25 PM

4:50 PM

GMT

Presentation

Bridging the gap between Model Development and AI Infrastructure

Large Language Models (LLMs) are transforming industries, but their success hinges not just on cutting-edge models, but on the ability to efficiently train, deploy, and manage them at scale. Training state-of-the-art models like GPT/LLAMA requires thousands of GPUs running for weeks, posing significant challenges for MLOps teams.

This session explores how LLMs are built, the scale of training required, and the growing infrastructure demands placed on MLOps teams. It will cover the key operational challenges of AI workloads, including distributed training, cost optimization, and multi-tenancy. Finally, the talk will highlight how Rafay’s GPU PaaS enables seamless AI infrastructure management, helping organizations scale AI efficiently without bottlenecks.

Attendees will gain a deeper understanding of LLM training, GPU workload management, and practical strategies to optimize AI infrastructure.

+ Read More

4:55 PM

5:05 PM

GMT

Break

5:05 PM

5:30 PM

GMT

Presentation

Eval Driven Development: Best Practices and Pitfalls When Building with AI

Learn how the best companies use evals to guide AI development and build a virtuous cycle of product improvement!

We’ll cover the fundamentals of how to use evaluation driven development to build reliable applications with Large Language Models (LLMs).

Building an AI application, RAG system, or agent involves many design choices. You have to choose between models, design prompts, expose tools to the model and build knowledge bases for RAG.

If you don’t have a good evaluation in place then it’s likely you’ll waste a lot of time making changes but not actually improving performance. Post-deployment, evaluation is essential to ensure that changes don’t introduce regressions to your product.

Using real world case-studies from AI teams at companies like Gusto, Filevine and Fundrise, who are building production-grade Agents and LLM applications, we’ll cover how to design your evaluators and use them as part of an iterative development process.

At the end of the session you should understand the pitfalls and best practices of how to construct evaluators in practice, as well as the process of evaluation driven AI development.

+ Read More

5:35 PM

6:00 PM

GMT

Presentation

Building Consumer Facing GenAI Chatbots: Lessons in AI Design, Scaling, and Brand Safety

Building a GenAI chatbot for millions of users? This session reveals the secret sauce: best practices in LLM orchestration, agentic workflows, and grounded responses, all while prioritizing brand safety. Learn key architectural decisions for balancing latency and quality, and discover strategies for scaling to production.

+ Read More

6:05 PM

6:35 PM

GMT

Panel Discussion

Building Platforms for Gen AI Workloads vs Traditional ML Workloads

As enterprises scale AI adoption, the infrastructure demands of GenAI workloads are pushing the limits of traditional ML platforms. This panel will explore the key differences in architecture, resource management, and operational challenges when building platforms for GenAI versus traditional ML. How do data, compute, and model lifecycle management change? What trade-offs exist between flexibility and control?

+ Read More

6:35 PM

6:45 PM

GMT

Break

6:45 PM

7:10 PM

GMT

Presentation

Unlocking AI Agents: Fixing Authorization to Get Real Work Done

This talk is about making AI agents truly useful by fixing how we handle authorization. Right now, we depend on API keys and static tokens stored in environments variables that tie actions to single users, which isn't flexible or secure for bigger operations. I'll cover why this holds us back from letting AI do important tasks, like sending emails or managing sensitive data, autonomously. We'll explore simple ways to update these systems, so AI can work for us without constant human intervention. This is all about moving beyond flashy demos to real-world, impactful AI applications.

+ Read More

7:15 PM

7:40 PM

GMT

Presentation

Introducing the Prompt Engineering Toolkit

A well-crafted prompt is essential for obtaining accurate and relevant outputs from LLMs (Large Language Models). Prompt design enables users new to machine learning to control model output with minimal overhead.

To facilitate rapid iteration and experimentation of LLMs at Uber, there was a need for centralization to seamlessly construct prompt templates, manage them, and execute them against various underlying LLMs to take advantage of LLM support tasks.

To meet these needs, we built a prompt engineering toolkit that offers standard strategies that encourage prompt engineers to develop well-crafted prompt templates.

The centralized prompt engineering toolkit enables the creation of effective prompts with system instructions, dynamic contextualization, massive batch offline generation (LLM inference), and evaluation of prompt responses. Furthermore, there’s a need for version control, collaboration, and robust safety measures (hallucination checks, standardized evaluation framework, and a safety policy) to ensure responsible AI usage.

+ Read More

7:45 PM

7:55 PM

GMT

Break

7:55 PM

8:20 PM

GMT

Presentation

LLM in Large-Scale Recommendation Systems: Use Cases and Challenges

The advent of Large Language Models (LLMs) has significantly transformed the landscape of recommendation systems, marking a shift from traditional discriminative approaches to more generative paradigms. This transition has not only enhanced the performance of recommendation systems but also introduced a new set of challenges that need to be addressed. LLMs has several practical use-cases in modern recommendation systems, including retrieval, ranking, embedding generation for users and items in diverse spaces, harmful content detection, user history representation, and interest exploration and exploitation.

However, integrating LLMs into recommendation systems is not without its hurdles. On the algorithmic front, issues such as bias, integrity, explainability, freshness, cold start, and the integration with discriminative models pose significant challenges. Additionally, there are numerous production deployment and development challenges, including training, inference, cost management, optimal resource utilization, latency, and monitoring. Beyond these, there are unforeseen issues that often remain hidden during A/B testing but become apparent once the model is deployed in a production environment. These include impact dilution, discrepancies between pre-test and backtest results, and model dependency, all of which can affect the overall effectiveness and reliability of the recommendation system. Addressing these challenges is crucial for harnessing the full potential of LLMs in recommendation systems.

+ Read More

8:25 PM

8:50 PM

GMT

Presentation

Doxing the Dark Web

The Dark Web: an estimated $3T USD flows annually through shell corps and tax havens worldwide -- serving as the perpetua mobilia for oligarchs, funding illegal weapons transfers, mercenaries, human trafficking at scale, anti-democracy campaigns, cyber attacks at global scale, even illegal fishing fleets. Tendrils of kleptocracy extend through the heart of London, reaching into many of the VC firms in Silicon Valley, and now into the White House.

The people who "catch bad guys" – investigative journalists, regulators, gov agencies – leverage AI apps to contend with the overwhelming data volumes. Few of those who do "bad guy hunting" get to speak at tech conferences. However, our team provides core technology for this work, and we can use open source, open models, and open data to illustrate. How technology gets used to stick the moves of the world's worst organized crime, how to fight against the oligarchs who use complex networks to hide their grift.

This talk explores known cases, tradecraft employed, and open data sources for fighting against kleptocracy. Moreover, we'll look at where AI and Data professionals are very much needed, where you can get involved.

+ Read More

8:55 PM

9:20 PM

GMT

Closing Keynote

The AI Developer Experience Sucks so Let's Fix it – The Story of Modal

Erik will talk about how he went into a deep infrastructure rabbit hole to fix the developer experience working with the cloud and GPUs. This involves building a custom file system, their own scheduler, and much more.

+ Read More

AI in Production 2025

We’re Back for Round Two!

Speakers

Agenda

Sponsors