AI in Production 2025 | Agentic AI Foundation

AI in Production 2025

Eval Driven Development: Best Practices and Pitfalls When Building with AI // Raza Habib & Brianna Connelly// AI in Production 2025

Learn how the best companies use evals to guide AI development and build a virtuous cycle of product improvement! We’ll cover the fundamentals of how to use evaluation driven development to build reliable applications with Large Language Models (LLMs). Building an AI application, RAG system, or agent involves many design choices. You have to choose between models, design prompts, expose tools to the model and build knowledge bases for RAG. If you don’t have a good evaluation in place then it’s likely you’ll waste a lot of time making changes but not actually improving performance. Post-deployment, evaluation is essential to ensure that changes don’t introduce regressions to your product. Using real world case-studies from AI teams at companies like Gusto, Filevine and Fundrise, who are building production-grade Agents and LLM applications, we’ll cover how to design your evaluators and use them as part of an iterative development process. At the end of the session you should understand the pitfalls and best practices of how to construct evaluators in practice, as well as the process of evaluation driven AI development.

Raza Habib & Brianna Connelly · Mar 13th, 2025

Comment

33:37

Video

The LLM Guardrails Index: Benchmarking Responsible AI Deployment // Shreya Rajpal // AI in Production 2025

As LLMs become more common, ensuring they're reliable, secure, and compliant is more important than ever. With many guardrail solutions popping up, how do AI engineers figure out which solutions work best for them? In this talk, Shreya will cover the first-ever AI Guardrails Index — a thorough evaluation of top guardrail solutions. Based on systematic assessment of 6 key AI risk categories, this benchmark helps AI developers, platform teams, and decision-makers better understand the landscape of LLM safety. Key takeaways: - The Current State of AI Guardrails and why they matter for responsible AI use - See how different guardrail solutions stack up in terms of precision, recall, accuracy, and speed. - Find out which guardrails work best for specific needs, from finance to healthcare to content filtering

# LLMs

# Guardrails

# AI Risks

Shreya Rajpal · Mar 13th, 2025

34:05

Video

The AI Developer Experience Sucks so Let's Fix it // Erik Bernhardsson // AI in Production

Erik talks about how he went into a deep infrastructure rabbit hole to fix the developer experience working with the cloud and GPUs. This involves building a custom file system, their own scheduler, and much more.

# AI Developer Experience

# Cloud

# GPU

# Modal Labs

Erik Bernhardsson · Mar 13th, 2025

Comment

16:50

Video

Bridging the Gap between Model Development and AI Infrastructure // Mohan Atreya // AI in Production 2025

Large Language Models (LLMs) are transforming industries, but their success hinges not just on cutting-edge models, but on the ability to efficiently train, deploy, and manage them at scale. Training state-of-the-art models like GPT/LLAMA requires thousands of GPUs running for weeks, posing significant challenges for MLOps teams. This session explores how LLMs are built, the scale of training required, and the growing infrastructure demands placed on MLOps teams. It will cover the key operational challenges of AI workloads, including distributed training, cost optimization, and multi-tenancy. Finally, the talk will highlight how Rafay’s GPU PaaS enables seamless AI infrastructure management, helping organizations scale AI efficiently without bottlenecks. Attendees will gain a deeper understanding of LLM training, GPU workload management, and practical strategies to optimize AI infrastructure.

# LLM

# GPU

# Rafay

Mohan Atreya · Mar 13th, 2025

Comment

32:21

Video

Challenges of Working with Voice AI Agents // Panel // AI in Production 2025

Voice AI agents sound great in theory but in practice? They come with a whole new set of challenges—latency, accuracy, real-time processing, handling ambiguity, and making them feel actually useful. This panel digs into the gritty realities of building and scaling voice agents that don’t just talk but truly deliver. An MLOps Community Production sponsored by Humanloop & Rafay

# Voice AI

# AI Agents

Monika Podsiadlo, Chiara Caratelli, Rex Harris & 1 more speaker · Mar 14th, 2025

Comment

37:51

Video

Doxing the Dark Web // Paco Nathan // AI in Production 2025

The Dark Web: an estimated $3T USD flows annually through shell corps and tax havens worldwide -- serving as the perpetua mobilia for oligarchs, funding illegal weapons transfers, mercenaries, human trafficking at scale, anti-democracy campaigns, cyber attacks at global scale, even illegal fishing fleets. Tendrils of kleptocracy extend through the heart of London, reaching into many of the VC firms in Silicon Valley, and now into the White House. The people who "catch bad guys" – investigative journalists, regulators, gov agencies – leverage AI apps to contend with the overwhelming data volumes. Few of those who do "bad guy hunting" get to speak at tech conferences. However, our team provides core technology for this work, and we can use open source, open models, and open data to illustrate. How technology gets used to stick the moves of the world's worst organized crime, how to fight against the oligarchs who use complex networks to hide their grift. This talk explores known cases, tradecraft employed, and open data sources for fighting against kleptocracy. Moreover, we'll look at where AI and Data professionals are very much needed, where you can get involved.

# Dark Web

# Cyber Attacks

# Senzing

Paco Nathan · Mar 14th, 2025

Comment

31:20

Video

Unlocking AI Agents: Fixing Authorization to Get Real Work Done // Sam Partee // AI in Production 2025

This talk is about making AI agents truly useful by fixing how we handle authorization. Right now, we depend on API keys and static tokens stored in environment variables that tie actions to single users, which isn't flexible or secure for bigger operations. I'll cover why this holds us back from letting AI do important tasks, like sending emails or managing sensitive data, autonomously. We'll explore simple ways to update these systems, so AI can work for us without constant human intervention. This is all about moving beyond flashy demos to real-world, impactful AI applications.

# AI Agents

# AI Applications

# Arcade.dev

Samuel Partee · Mar 14th, 2025

Comment

28:37

Video

Advancing GraphRAG: Multimodal Integration with Associative Intelligence // Amy Hodler & David Hughes // AI in Production 2025

Integrating graphs with RAG processes has demonstrated clear benefits in improving the accuracy and explainability of GenAI. Graphs enhance the semantic capability of vector searches with more global enrichment and domain-specific grounding. The increasing adoption of GraphRAG reflects its value as shown in numerous blogs, GitHub projects, research, and formal articles. But graphs are fiddly and an iterative approach is almost always required. Today’s GraphRAG approaches focus on text and lexical graphs. However, non-text data is dense with latent signals that we currently just toss out. Integrating information from images and audio would prove an extremely rich layer of context to agentic workflows. The next major advance in GraphRAG, will be incorporating all the semantic signals latent in images and audio. This session focuses on multimodal GraphRAG or mmGraphRAG. mmGraphRAG represents a transformative step forward in bridging multimodal data through innovative search and analytics frameworks. We’ll demonstrate how integrating the semantic richness of images and text with the contextual reasoning power of graphs, mmGraphRAG provides a comprehensive, explainable, and actionable approach to solving complex data challenges. You’ll learn how to incorporate images into GraphRAG and customize graph schemas as well as search that combines visual elements. We’ll walk you through the high-level architecture and the use of associative intelligence to transform search and analytics. Notebooks that illustrate creating embeddings and creating a multimodal graph from image decomposition will be provided so you can explore how mmGraphRAG can be applied to specific domains. We’ll also leave time to discuss the implications of adding graph pattern analytics to images.

# RAG

# GenAI

# GitHub

Amy Hodler & David Hughes · Mar 14th, 2025

Comment

34:35

Video

Small Language Model - From Experiments to Production // Joshua Alphonse // AI in Production 2025

As LLMs become widespread, enterprises build AI apps, often exposing sensitive data to centralized service providers and getting locked into their models. While smaller, specialized models can cut costs by up to 70%. In this talk, I'll quickly take you over what goes into building production-ready SLMs.

# LLM

# AI Apps

# PremAI

Joshua Alphonse · Mar 14th, 2025

Comment

31:12

Video

LLM in Large-Scale Recommendation Systems // Aditya Gautam // AI in Production

The advent of Large Language Models (LLMs) has significantly transformed the landscape of recommendation systems, marking a shift from traditional discriminative approaches to more generative paradigms. This transition has not only enhanced the performance of recommendation systems but also introduced a new set of challenges that need to be addressed. LLMs has several practical use-cases in modern recommendation systems, including retrieval, ranking, embedding generation for users and items in diverse spaces, harmful content detection, user history representation, and interest exploration and exploitation. However, integrating LLMs into recommendation systems is not without its hurdles. On the algorithmic front, issues such as bias, integrity, explainability, freshness, cold start, and the integration with discriminative models pose significant challenges. Additionally, there are numerous production deployment and development challenges, including training, inference, cost management, optimal resource utilization, latency, and monitoring. Beyond these, there are unforeseen issues that often remain hidden during A/B testing but become apparent once the model is deployed in a production environment. These include impact dilution, discrepancies between pre-test and backtest results, and model dependency, all of which can affect the overall effectiveness and reliability of the recommendation system. Addressing these challenges is crucial for harnessing the full potential of LLMs in recommendation systems.

# LLM

# AI Systems

# Meta

Aditya Gautam · Mar 17th, 2025

Comment

32:48

Video

Consumer Facing GenAI Chatbots: Lessons in AI Design, Scaling & Brand Safety // Afshaan Mazagonwalla

Building a GenAI chatbot for millions of users? This session reveals the secret sauce: best practices in LLM orchestration, agentic workflows, and grounded responses, all while prioritizing brand safety. Learn key architectural decisions for balancing latency and quality, and discover strategies for scaling to production.

# Google

# ChatBot

Afshaan Mazagonwalla · Mar 17th, 2025

Comment

32:58

Video

Intentional Arrangement: From Digital Hellscape to Information Nirvana // Jessica Talisman

Humans have catalogued information for more than 3,000 years, a practice that has evolved as technology has advanced. The Library and Information Science discipline is responsible for building systems for organizing data, to serve our physical and digital knowledge domains. How have librarians sustained analog and digital repositories? Intentional arrangement. With the current wave of artificial intelligence (AI), many organizations are struggling from poor data management. Welcome to the digital hellscape, rife with dirty data that is un-curated, unstructured, undefined and ambiguous. To emerge from this hellscape, look towards the librarians, who can show you the way. Controlled vocabularies, taxonomies, thesauri, ontologies and knowledge graphs have all emerged from the librarian’s toolbox. In kind, AI performance is optimized when trained on the same, intentionally arranged, structured data. Harmonious data ecosystems, optimized for human and machine, is our information nirvana and can be achieved with intentional arrangement.Humans have catalogued information for more than 3,000 years, a practice that has evolved as technology has advanced. The Library and Information Science discipline is responsible for building systems for organizing data, to serve our physical and digital knowledge domains. How have librarians sustained analog and digital repositories? Intentional arrangement. With the current wave of artificial intelligence (AI), many organizations are struggling from poor data management. Welcome to the digital hellscape, rife with dirty data that is un-curated, unstructured, undefined and ambiguous. To emerge from this hellscape, look towards the librarians, who can show you the way. Controlled vocabularies, taxonomies, thesauri, ontologies and knowledge graphs have all emerged from the librarian’s toolbox. In kind, AI performance is optimized when trained on the same, intentionally arranged, structured data. Harmonious data ecosystems, optimized for human and machine, is our information nirvana and can be achieved with intentional arrangement.

# Adobe

# Data Management

Jessica Talisman · Mar 17th, 2025

Comment

30:44

Video

AI Powered Digital Twins of Real Organizations- Aspiration and Reality // Hala Nelson

Because historically Data and AI have evolved within relatively separate communities, their capabilities, benefits, and adoption strategies are valued differently by different work teams and investment decision makers. Many of us are currently attempting to harness the power of AI technologies at large complex organizations, or small ones for that matter. Initiatives span across a wide range of interests within an organization: AI specialists, data engineers, IT departments, strategists, ethicists, executives, and the people on the ground. How does an implementation team guide a 32,000 person institution to optimally adopt AI within the very short time attention span of an executive who wants an immediate return on investment? Is striking a deal with Microsoft Co-Pilot or OpenAI enough? Can resources be justified for the ambitious goal to create a digital twin of the processes and systems of an entire organization where AI can be applied to drive efficiencies and improvements? Are the required technologies, expertise, and resources available? In this talk I will present my experience on what it takes to move from aspiration to an implementable reality- from math to data to strategy to people to everything in between. We’ll also try to answer the question on everyone’s mind: will we eventually succeed, or will all our efforts end up in the wasteland of failed projects, efforts, funding, and time?

# DATA

# Processes

# AI

Hala Nelson · Mar 17th, 2025

Comment

25:43

Video

LLMs in Production - How to Keep Them from Breaking // Vaibhav Gupta // AI in Production 2025

Deploying Large Language Models (LLMs) in production brings a host of challenges well beyond prompt engineering. Once they’re live, the smallest oversight—like a malformed API call or unexpected user input—can cause failures you never saw coming. In this talk, Vaibhav Gupta will share proven strategies and practical tooling to keep LLMs robust in real-world environments. You’ll learn about structured prompting, dynamic routing with fallback handlers, and data-driven guardrails—all aimed at catching errors before they break your application. You’ll also hear why naïve use of JSON can reduce a model’s accuracy, and discover when it’s wise to push back on standard serialization in favor of more flexible output formats. Whether you’re processing 100+ page bank statements, analyzing user queries, or summarizing critical healthcare data, you’ll not only understand how to keep LLMs from “breaking,” but also how to design AI-driven solutions that scale gracefully alongside evolving user needs.

# BAML

# LLMS

Vaibhav Gupta · Mar 17th, 2025

Comment

30:40

Video

Introducing the Prompt Engineering Toolkit // Sishi Long // AI in Production 2025

A well-crafted prompt is essential for obtaining accurate and relevant outputs from LLMs (Large Language Models). Prompt design enables users new to machine learning to control model output with minimal overhead. To facilitate rapid iteration and experimentation of LLMs at Uber, there was a need for centralization to seamlessly construct prompt templates, manage them, and execute them against various underlying LLMs to take advantage of LLM support tasks. To meet these needs, we built a prompt engineering toolkit that offers standard strategies that encourage prompt engineers to develop well-crafted prompt templates. The centralized prompt engineering toolkit enables the creation of effective prompts with system instructions, dynamic contextualization, massive batch offline generation (LLM inference), and evaluation of prompt responses. Furthermore, there’s a need for version control, collaboration, and robust safety measures (hallucination checks, standardized evaluation framework, and a safety policy) to ensure responsible AI usage.

Sishi Long · Mar 17th, 2025

Comment

35:00

Video

Leveraging Knowledge Graphs and LLMs for Enhanced Criminal Intelligence // Alessandro Negro

This talk presents a three-step process that combines knowledge graphs with large language models (LLMs) to revolutionize how law enforcement agencies gather, analyze, and share criminal intelligence. This approach addresses critical challenges in modern policing: data silos, investigation complexity, and the need for transparent, explainable intelligence sharing.

# Graphs

# Llms

Alessandro Negro · Mar 21st, 2025

Comment

32:26

Video

Wrangling Wild Agents // Ezo Saleh & Aisha Yusaf // AI in Production

Like brilliant but untamed minds, agentic applications in production present a unique challenge: they solve problems in revolutionary ways but can be wildly unpredictable. The art of deploying these free spirits requires a delicate balance between autonomy and reliability. At Orra, we've developed a "glue layer" that acts as a skilled wrangler, ensuring reliability while preserving the agents' freedom in production environments. We'll explore its architecture including our approach to adaptive execution planning, and how we enhance domain understanding.

# Agents

# Autonomy

# Domain

Ezo Saleh & Aisha Yusaf · Mar 21st, 2025

Comment

26:17

Video

Graph Retrieval - Let Me Count The Ways // Egor Kraev // AI in Production

Whenever you read about taking Retrieval Augmented Generation beyond simple vector search on embeddings, graphs are almost sure to come up. But what graphs? Old-school knowledge graphs, with entities and their relationships, or document-centric graphs, with text snippets as nodes? And how do you use them to improve your retrieval? Nearest neighborhood? PageRank? Something else? I will provide an overview of what's happening in that space, including what I'm doing, and give you a tour of the different options, with their pros and cons.

# Graph

# Wise

Egor Kraev · Mar 21st, 2025

Comment

25:35

Video

Considerations for the Acquisition of a Chatbot // Bassey Etim & Erica Greene // AI in Production

In a world increasingly saturated with AI-driven applications, businesses face mounting pressure to integrate chatbots into their digital offerings. But is building a chatbot always a good idea? In this talk, we’ll channel our inner Agent Scully—skeptical but willing to investigate—as we guide you through seven critical questions that can help determine whether a chatbot is a wise investment for your company.

# Chatbot

# Scully

# Agent

Bassey Etim & Erica Greene · Mar 21st, 2025

Comment

35:08

Video

Quantized LLM Training at Scale with ZeRO++ // Guanhua Wang // AI in Production 2025

Communication is the major bottleneck in large-scale LLM training. In ZeRO++, we quantize both weights and gradients during training in order to reduce the communication volume by 4x, which leads to end-to-end training time reduction by over 50%.

# LLM Training

# Microsoft

# Zero++

Guanhua Wang · Mar 21st, 2025

Comment

24:35

Video

Accelerating Growth Through Optimizing GPU Usage // Sahil Khanna // AI in Production 2025

The presentation explores the critical importance of optimizing GPU usage for generative AI models. It delves into the journey of Adobe's Compute Platform, highlighting the challenges faced and the innovative solutions implemented to enhance GPU utilization, resource management, and reliability. The presentation also provides an overview of the AI Compute Platform Architecture and acknowledges the contributions of the dedicated team members who made these advancements possible.

# GPU

# Adobe

Sahil Khanna · Mar 21st, 2025

Comment

23:53

Video

Synthetic data: Breaching the low data barrier in Industrial Computer Vision systems // Vasu Sharma

High-quality data collection remains one of the biggest bottlenecks in deploying AI systems at scale. Today, nearly 54% of AI projects stall at the proof-of-concept stage due to prolonged data acquisition challenges. In industries like manufacturing and industrial automation, gathering just a handful of images for object detection tasks can take six months to a year, given the complexity of these environments and the need for highly reliable models. With the rise of generative AI, synthetic data presents a transformative approach. This can accelerate data collection, reduce development cycles, and enable faster deployment of robust AI models in production.

# Synthetic Data

# Meta

# Computer Vision

Vasu Sharma & Qasim Wani · Mar 26th, 2025

17:30

Video

Knowledge Graphs in Biomedicine // Dimitrios Athanasakis // AI in Production 2025

Knowledge Graphs are the lingua franca of biomedicine. The talk introduces knowledge graphs and shows how modern deep learning approaches can be applied to reasoning over knowledge graphs and how this translates to actionable insights for the biomedical domain.

# Biomedicine

# Knowledge Graphs

Dimitrios Athanasakis · Mar 26th, 2025

14:42

Video

Machine Consciousness? Get Real! // Ron Chrisley // AI in Production 2025

In discussions about AI, two extreme views dominate: some claim machine consciousness is impossible, while others suggest we are already there—or that we just need larger models and more data. Both are wrong. This talk will cut through the hype and explore what consciousness actually entails, why current AI models don’t have it, and what it would take to build AI systems that might one day possess true awareness. I'll critically examine both the overblown optimism that assumes consciousness is just a scaling issue and the overly rigid skepticism that deems it forever out of reach. By breaking down the philosophical and technical dilemmas involved, we can better understand what’s missing in today’s AI and what meaningful progress in this space might look like.

# Machine conciousness

# AI awareness

Ron Chrisley · Mar 26th, 2025

Comment

13:10

Video

Human-AI Collaboration: It's Not So Simple // C. Merrell Stone // AI in Production 2025

It’s very easy to get caught up in imagining all of the amazing things AI will one day be able to help us with. But many organizations cannot afford to invest in pipe dreams. Instead, they need to focus on what is achievable now. In this talk we'll explore a specific case study about a project designed to create a hybrid human/AI agent system and extract some high-level principles that will give participants a great place to start as they build their own agential systems.

# Hybrid human

# AI

# AI Agent

Merrell Stone · Mar 26th, 2025

Comment

26:49

Video

Expanding Analytics to All Data with GenAI, Graph, and Visual Analytics // Weidong Yang

Existing BI and big data solutions primarily consumes structured data, which accounts for only about 20% of enterprise information, leaving vast amounts of unstructured data underutilized. In this talk, we introduce GraphBI, which aims to address this challenge by combining GenAI, graph technology, and visual analytics to unlock the full potential of enterprise data. Technologies like Retrieval-Augmented Generation (RAG) and GraphRAG enhance summarization and Q&A but often function as black boxes, making verification difficult. In contrast, GraphBI takes a different approach: using GenAI for data pre-processing, transforming unstructured data into a graph-based format. This transparent, step-by-step workflow ensures trustworthiness and transparency of the analytics process. In this talk, we’ll walk through the GraphBI workflow, covering best practices and challenges, including: Architectural considerations for projects of varying scales; Data pre-processing, including knowledge map extraction and entity resolution; And Iterative analytics with a BI-focused graph grammar. This approach uniquely surfaces business insights by effectively incorporating all types of data.

# GenAI

# Graphs

# Visualization

Weidong Yang · Mar 24th, 2025

Comment

23:05

Video

Deploying Agentic AI to Navigate Industrial Processes // Sebastian Kukla // AI in Production

What should industry know before and during an Agentic AI implementation. What does it actually look like and what are the responsibilities for the consumer. For developers - what is going on in the heads of your customer and how do you coach them through it?

# Agentic

# AI in Industry

Sebastian Kukla · Mar 24th, 2025

Comment

26:01

Video

Fine-Tuning is Broken: Why You're Doing It Wrong // Tanmay Chopra // AI in Production 2025

Fine-tuning isn't just about throwing more data at a model with the same pretraining loss. That’s just extended pretraining. True fine-tuning means modifying loss functions, adjusting output heads, and optimizing for real-world constraints like confidence calibration, consistency, and latency. This talk explores how misguided fine-tuning practices lead to brittle, inefficient models and demonstrates practical strategies to tailor models to production needs. We dive into when to finetune, the advantages of true finetuning (from output constraints to confidence scores to drastically lower latency) and show how finetuning can be about more than just style.

# Fine-tuning LLMs

Tanmay Chopra · Mar 24th, 2025

Comment

25:27