MLOps Community
+00:00 GMT
LLM in Production
# LLM Use Cases
# LLM in Production
# MLOPs tooling

The State of Production Machine Learning in 2024 // Alejandro Saucedo // AI in Production

As the number of production machine learning use-cases increase, we find ourselves facing new and bigger challenges where more is at stake. Because of this, it's critical to identify the key areas to focus our efforts, so we can ensure we're able to transition from machine learning models to reliable production machine learning systems that are robust and scalable. In this talk we dive into the state of production machine learning in 2024, and we will cover the concepts that make production machine learning so challenging, as well as some of the recommended tools available to tackle these challenges. We will be covering a deep dive of the production ML tooling ecosystem and dive into best practices that have been abstracted from production use-cases of machine learning operations at scale, as well as how to leverage tools to that will allow us to deploy, explain, secure, monitor and scale production machine learning systems.
Alejandro Saucedo
Demetrios Brinkmann
Alejandro Saucedo & Demetrios Brinkmann · Feb 25th, 2024
Popular topics
# MLops
# LLMs
# MLOps
# RAG
# Generative AI
# AI
# Machine learning
# AI Agents
# GenAI
# ML Systems
# Technical Debt
# Elastic.co
# Tecton
# Design Patterns
# Synthetic Data
# Python
# ML Pipelines
# Artificial Intelligence
# You.com
# ChatGPT
All
Denys Linkov
Denys Linkov · Jul 21st, 2023
What are some of the key differences in using 100M vs 100B parameter models in production? In this talk, Denys from Voiceflow will cover how their MLOps processes have differed between smaller transformer models and LLMs. He'll walk through how the main 4 production models Voiceflow uses differ, and the processes plus product planning behind each one. The talk will cover prompt testing, automated training, real-time inference, and more!
# LLM in Production
# Production Models
# Voiceflow
# voiceflow.com
24:44
Gerred Dillon
Gerred Dillon · Jul 21st, 2023
Despite the power of best-in-class Large Language Models and Generative AI, the use of hosted APIs for models in highly sensitive, and regulated environments is being challenged. From fine-tuning and embedding sensitive data to creating small models in edge and air-gapped environments, users in these regulated environments need production-ready ways to run and observe models. Beyond that, both the software and the models that are being deployed need to have the Authorization to Operate in every one of these environments. LeapfrogAI is an open-source, open-contribution set of tools designed to meet the challenging requirements of these environments. Come learn about what makes these environments so rigorous, the work going on in enabling Defense missions to use Generative AI safely and successfully and hear more about how LeapfrogAI enables these missions.
# LLM in Production
# Local LLMs
# Defense Unicorn
23:07
Maxime Beauchemin
Demetrios Brinkmann
Maxime Beauchemin & Demetrios Brinkmann · Jul 21st, 2023
It’s clear that test-driven development plays a pivotal role in prompt engineering, potentially even more so than in traditional software engineering. By embracing TDD, product builders can effectively address the unique challenges presented by AI systems and create reliable, predictable, and high-performing products that harness the power of AI.
# LLM in Production
# AI Product Development
# Preset
21:26
Bradley Heilbrun
Demetrios Brinkmann
Bradley Heilbrun & Demetrios Brinkmann · Jul 21st, 2023
GPU-enabled hosts are a significant driver of cloud costs for teams serving LLMs in production. Preemptible instances can provide significant savings but generally aren’t fit for highly available services. This lightning talk tells the story of how Replit switched to preemptible GKE nodes, tamed the ensuing chaos, and saved buckets of cash while improving uptime.
# LLM in Production
# Optimizing Server Startup
# Repl.it
12:42
Xin Liang
Demetrios Brinkmann
Xin Liang & Demetrios Brinkmann · Jul 21st, 2023
Large language models (LLMs) have revolutionized AI, breaking down barriers to entry to cutting-edge AI applications, ranging from sophisticated chatbots to content creation engines.
# LLM in Production
# LLM-based Feature Extraction
# Canva
27:02
Vipul Ved Prakash
Vipul Ved Prakash · Jul 21st, 2023
Creating a new LLM is a difficult and expensive process, and there are several aspects that we need to get right — (1) a broad training dataset (2) a strong base model, (3) a well-aligned instruction dataset, (4) a carefully designed moderation subsystem, and (5) cost-effective training infrastructure coupled with an efficient software stack. Together’s central thesis is that these processes can be open-sourced, and we can harness the power of community to build and improve models, in the same way great open-source software has been built for decades. In this talk, I will introduce RedPajama, an open-source effort driven by Together and Collaborators, and show how to build an LLM with the power of community.
# LLM in Production
# RedPajama
# Together
27:52
Aravind Srinivas
Demetrios Brinkmann
Aravind Srinivas & Demetrios Brinkmann · Jul 21st, 2023
Perplexity AI is an answer engine that aims to deliver accurate answers to questions using LLMs. Perplexity's CEO Aravind Srinivas will introduce the product and discuss some of the challenges associated with building LLMs.
# LLM in Production
# Power Consumer Search
# Perplexity AI
37:04
Travis Cline
Demetrios Brinkmann
Travis Cline & Demetrios Brinkmann · Jul 21st, 2023
A quick run-through of our recent project to visualize and explore the MLOps community trends by building interactive tools to see Slack message content in new lights.
# LLM in Production
# LLM Stack
# Virta
10:38
Mathieu Bastian
Mathieu Bastian · Jul 17th, 2023
We've built GetYourGuide's ChatGPT plugin in about a week with a few engineers. It was an interesting experience we would love to share.
# LLM in Production
# Chat GPT Plugin
# GetYourGuide
9:26
Chunting Zhou
Chunting Zhou · Jul 17th, 2023
How do you turn a language model into a chatbot without any user interactions? LIMA is a LLaMa-based model fine-tuned on only 1,000 curated prompts and responses, which produces shockingly good responses. * No user data * No mode distillation * No RLHF What does this tell us about language model alignment? In this talk, Chunting shares what we have learned throughout the process.
# LLM in Production
# LIMA
# FAIR Labs
11:18
Popular