LLM in Production

# LLM Use Cases

# LLM in Production

# MLOPs tooling

The State of Production Machine Learning in 2024 // Alejandro Saucedo // AI in Production

As the number of production machine learning use-cases increase, we find ourselves facing new and bigger challenges where more is at stake. Because of this, it's critical to identify the key areas to focus our efforts, so we can ensure we're able to transition from machine learning models to reliable production machine learning systems that are robust and scalable. In this talk we dive into the state of production machine learning in 2024, and we will cover the concepts that make production machine learning so challenging, as well as some of the recommended tools available to tackle these challenges. We will be covering a deep dive of the production ML tooling ecosystem and dive into best practices that have been abstracted from production use-cases of machine learning operations at scale, as well as how to leverage tools to that will allow us to deploy, explain, secure, monitor and scale production machine learning systems.

Alejandro Saucedo & Demetrios Brinkmann · Feb 25th, 2024

All

Maxime Beauchemin & Demetrios Brinkmann · Jul 21st, 2023

Taming AI Product Development Through Test-driven Prompt Engineering

It’s clear that test-driven development plays a pivotal role in prompt engineering, potentially even more so than in traditional software engineering. By embracing TDD, product builders can effectively address the unique challenges presented by AI systems and create reliable, predictable, and high-performing products that harness the power of AI.

# LLM in Production

# AI Product Development

# Preset

Gerred Dillon · Jul 21st, 2023

Enabling Defense Missions with Local LLMs

Despite the power of best-in-class Large Language Models and Generative AI, the use of hosted APIs for models in highly sensitive, and regulated environments is being challenged. From fine-tuning and embedding sensitive data to creating small models in edge and air-gapped environments, users in these regulated environments need production-ready ways to run and observe models. Beyond that, both the software and the models that are being deployed need to have the Authorization to Operate in every one of these environments. LeapfrogAI is an open-source, open-contribution set of tools designed to meet the challenging requirements of these environments. Come learn about what makes these environments so rigorous, the work going on in enabling Defense missions to use Generative AI safely and successfully and hear more about how LeapfrogAI enables these missions.

# LLM in Production

# Local LLMs

# Defense Unicorn

Denys Linkov · Jul 21st, 2023

LLMs vs LMs in Prod

What are some of the key differences in using 100M vs 100B parameter models in production? In this talk, Denys from Voiceflow will cover how their MLOps processes have differed between smaller transformer models and LLMs. He'll walk through how the main 4 production models Voiceflow uses differ, and the processes plus product planning behind each one. The talk will cover prompt testing, automated training, real-time inference, and more!

# LLM in Production

# Production Models

# Voiceflow

# voiceflow.com

Vipul Ved Prakash · Jul 21st, 2023

Building RedPajama

Creating a new LLM is a difficult and expensive process, and there are several aspects that we need to get right — (1) a broad training dataset (2) a strong base model, (3) a well-aligned instruction dataset, (4) a carefully designed moderation subsystem, and (5) cost-effective training infrastructure coupled with an efficient software stack. Together’s central thesis is that these processes can be open-sourced, and we can harness the power of community to build and improve models, in the same way great open-source software has been built for decades. In this talk, I will introduce RedPajama, an open-source effort driven by Together and Collaborators, and show how to build an LLM with the power of community.

# LLM in Production

# RedPajama

# Together

Bradley Heilbrun & Demetrios Brinkmann · Jul 21st, 2023

Preemption Chaos and Optimizing Server Startup

GPU-enabled hosts are a significant driver of cloud costs for teams serving LLMs in production. Preemptible instances can provide significant savings but generally aren’t fit for highly available services. This lightning talk tells the story of how Replit switched to preemptible GKE nodes, tamed the ensuing chaos, and saved buckets of cash while improving uptime.

# LLM in Production

# Optimizing Server Startup

# Repl.it

Aravind Srinivas & Demetrios Brinkmann · Jul 21st, 2023

Using LLMs to Power Consumer Search at Scale

Perplexity AI is an answer engine that aims to deliver accurate answers to questions using LLMs. Perplexity's CEO Aravind Srinivas will introduce the product and discuss some of the challenges associated with building LLMs.

# LLM in Production

# Power Consumer Search

# Perplexity AI

Travis Cline & Demetrios Brinkmann · Jul 21st, 2023

MLOps LLM Stack Hackathon Winner: Exploring the MLOps Community Trends

A quick run-through of our recent project to visualize and explore the MLOps community trends by building interactive tools to see Slack message content in new lights.

# LLM in Production

# LLM Stack

# Virta

Xin Liang & Demetrios Brinkmann · Jul 21st, 2023

LLM-based Feature Extraction for Operational Optimization

Large language models (LLMs) have revolutionized AI, breaking down barriers to entry to cutting-edge AI applications, ranging from sophisticated chatbots to content creation engines.

# LLM in Production

# LLM-based Feature Extraction

# Canva

David Hershey, Daniel Jeffries & Demetrios Brinkmann · Jul 17th, 2023

Fireside Chat - The Future of LLMs

Evaluating the performance of language models (LLMs) is a pressing issue for companies working with generative AI. Defining what makes a model "good" and measuring its performance are challenging due to the diverse range of LLM applications. Existing evaluation methods, including benchmarks and user preference comparisons, have limitations in scalability and objectivity. The future of LLM evaluation lies in scaling testing with machine learning systems, such as reward models that capture user preferences, and simulating user sessions to generate comprehensive test cases. These approaches will help developers select models, create effective prompts, ensure compliance, and enhance LLM quality.

# Future of LLMs

# LLM in Production

# LLM Applications

Mathieu Bastian · Jul 17th, 2023

From an API to a Chat GPT Plugin

We've built GetYourGuide's ChatGPT plugin in about a week with a few engineers. It was an interesting experience we would love to share.

# LLM in Production

# Chat GPT Plugin

# GetYourGuide

.css-1t9010w-StyledLink:hover *{color:var(--theme-color-primary, #C92C7F);}The State of Production Machine Learning in 2024 // Alejandro Saucedo // AI in Production

The State of Production Machine Learning in 2024 // Alejandro Saucedo // AI in Production