LIVESTREAM

LLMs in Production - Part II

# LLM in Production

# Large Language Models

Large Language Models in Production Conference

Join us for two days of talking with some of our favorite people at the forefront of using LLMs in the wild, and an in-person workshop in San Francisco on how to build and deploy LLM based apps hosted by Anyscale.

There will be over 50 Speakers from Stripe, Meta, Canva, Databricks, Anthropic, Cohere, Redis, Langchain, Chroma, Humanloop and so many more.

This all started after we put together the LLM in-production survey and realized there are still lots of unknowns when dealing with LLMs, especially when dealing with them at scale. We open-sourced all the responses and we decided if no one was going to talk about working with LLMs in a non-over-hyped way, we would have to.

Let's discover how to use these damn probabilistic models in the best ways possible without sacrificing the necessary software design building blocks.

Expect all the fun and learnings from the first one. DOUBLED.

And remember, there will be some sweeeet sweet swag giveaways.

Huge Shoutout to all our sponsors of this event, find more info about them below.

Speakers

Matei Zaharia

Cofounder and Chief Technologist @ Databricks

Chip Huyen

CEO @ Claypot AI

Sarah Aerni

Vice President, AI/Machine Learning and Engineering @ Salesforce

Emmanuel Ameisen

Research Engineer @ Anthropic

Sumit Kumar

Senior Machine Learning Engineer @ Meta

Shreya Rajpal

Creator @ Guardrails AI

Aravind Srinivas

CEO & Co-Founder @ Perplexity.ai

Ines Chami

Co-Founder @ NumbersStationAI

Samyam Rajbhandari

Principal Architect @ Microsoft Corporation

Omar Sanseviero

Machine Learning Lead @ Hugging Face

Alex Ratner

CEO and Co-founder @ Snorkel AI

Sam Charrington

Host @ TWIML AI Podcast

Raza Habib

CEO and Co-founder @ Humanloop

Chris Van Pelt

Co-founder / CISO @ Weights & Biases

Samuel Partee

CTO & Co-Founder @ Arcade AI

Sophie Daly

Staff Data Scientist @ Stripe

Misty Free

Product Manager @ Jasper

Scott Mackie

Founding Engineer @ Mem Labs

Anton Troynikov

Founder / Head of Technology @ Chroma

Sahar Mor

Product Lead @ Stripe

George Mathew

Managing Director @ Insight Partners

Manjot Pahwa

VP @ Lightspeed

Azin Asgarian

AI Technical Lead @ Georgian

Lance Martin

Software engineer @ LangChain

Daniel Campos

Research Scientist @ Snowflake

Richa Sachdev

Executive Director- Data Operations and Automation @ JP Morgan Chase

Chris Brousseau

Lead Data Scientist - International Focus @ Mastercard

Xin Liang

Senior Machine Learning Engineer @ Canva

Denys Linkov

ML Lead @ Voiceflow

Monmayuri Ray

MLOPS Advisor @ Gitlab

Joseph Gonzalez

Professor, Co-Founder & VP of Product @ UC Berkeley, Aqueduct

Amrutha Gujjar

CEO & Co-Founder @ Structured

Daniel Vila Suero

CEO & Co-Founder @ Argilla

David Aponte

Senior Research SDE, Applied Sciences Group @ Microsoft

Travis Fischer

Founder @ Agentic

Tristan Zajonc

Cofounder & CEO @ Continual

Vikram Sreekanti

CEO @ Aqueduct

Daniel Whitenack

Founder and Data Scientist @ Prediction Guard

Manojkumar Parmar

CEO, CTO @ AIShield - Corporate Startup of Bosch

Willem Pienaar

Co-Founder & CTO @ Cleric

Abi Aryan

Machine Learning Engineer @ Independent Consultant

Soham Chatterjee

Machine Learning Lead @ Sleek

Josh Tobin

Founder @ Gantry

Sohini Roy

Senior Developer Relations Manager @ NVIDIA

Sebastian Cattes

Senior Data Scientist @ INWT Statistics

Aparna Dhinakaran

Co-Founder and Chief Product Officer @ Arize AI

Gerred Dillon

Unicorn Engineer @ Defense Unicorns

Andrew Mauboussin

Engineering Lead @ Surge AI

Asmitha Rathis

Machine Learning Engineer @ PromptOps

Rohit Agarwal

CEO @ Portkey.ai

Oscar Rovira

Co-founder @ Mystic AI

Mark Huang

Co-Founder @ Gradient

Mark Craddock

Founder & CTO @ FirstLiot Ltd.

Shrinand Javadekar

All things Kubernetes @ Outerbounds, Inc.

Alberto Rizzoli

CEO @ V7

Maxime Beauchemin

CEO and Founder @ Preset

Chunting Zhou

Research Scientist @ Meta AI

Claire Longo

Head of ML Solutions Engineering @ Arize AI

Emmy Li

Technical Trainer @ Anyscale

Raahul Dutta

MLOps Lead @ Elsevier

Bradley Heilbrun

Engineer @ Replit

Demetrios Brinkmann

Chief Happiness Engineer @ MLOps Community

Filippo Pedrazzini

CTO @ Prem Labs

Natalia Burina

AI Product Leader @ ex Meta AI

Waleed Kadous

Head of Engineering @ Anyscale

Diego Oppenheimer

Co-founder @ Guardrails AI

Yada Pruksachatkun

ML Lead @ Moonhub

Stefan Ojanen

Director of Product Management @ Genesis Cloud

Travis Addair

CTO @ Predibase Inc.

Rohit Agarwal

Co--Founder @ Portkey.ai

Vipul Ved Prakash

Co-founder & CEO @ Together

Yujian Tang

Developer Advocate @ Zilliz

David Hershey

Member of Technical Staff @ Anthropic

Adam Breindel

Technical Instructor @ Anyscale

Davis Treybig

Partner @ Innovation Endeavors

Dina Yerlan

Head of Product, Generative AI Data @ Adobe, Firefly

Rahul Parundekar

Founder @ A.I. Hero, Inc.

Patrick Barker

CTO @ Kentauros AI

Hemant Jain

Senior Software Engineer, ML Inference @ Cohere

Atindriyo Sanyal

Co-founder, CTO @ Galileo

Mathieu Bastian

Director of Engineering, Data Products @ GetYourGuide

Nikunj Bajaj

Cofounder & CEO @ TrueFoundry

Artem Harutyunyan

Co-Founder & CTO @ Bardeen AI

Travis Cline

Engineering Manager, Platform @ Virta

Daniel Jeffries

Chief Executive Officer @ Kentauros AI

Will Gaviria Rojas

Co-Founder @ Coactive AI

Agenda

Day 1Day 2

Track View

Track 1

Track 2

Workshops

In-Person Events

4:30 PM, GMT

4:40 PM, GMT

Stage 2

Lightning Talk

It Worked When I Prompted It

The journey from LLM PoCs to production deployment is fraught with unique challenges, from maintaining model reliability to effectively managing costs. In this talk, we delve deep into these complexities, outlining design patterns for successful LLM production, the role of vector databases, strategies to enhance reliability, and cost-effective methodologies.

+ Read More

4:40 PM, GMT

4:50 PM, GMT

Stage 2

Lightning Talk

Create a Contextual Chatbot with LLM & Vector DB in 10 Min.

Building a chatbot is not easy....Or is it? We need:

An embedding model that translates questions to a matrix. A Vector database to search. LLM to generate the answers.

We can orchestrate the job using Langhcain with minimum development.

+ Read More

4:50 PM, GMT

5:00 PM, GMT

Stage 2

Lightning Talk

Wardley Mapping Prompt Engineering

Using Wardley Maps we can understand value chains and map out the landscape. Using this to develop strategies and understand where to target our efforts.

+ Read More

5:00 PM, GMT

5:20 PM, GMT

Stage 2

1:1 networking

Networking session

Take a moment to randomly match with others in this event by participating in the networking sessions. To access the random introductions click on the match tab in the left sidebar.

+ Read More

5:20 PM, GMT

5:50 PM, GMT

Stage 2

Presentation

LLMs For the Rest of Us

Proprietary LLMs are difficult for enterprises to adopt because of security and data privacy concerns. Open-source LLMs can circumvent many of these problems. While open LLMs are incredibly exciting, they're also a nightmare to deploy and operate in the cloud. Aqueduct enables you to run open LLMs in a few lines of vanilla Python on any cloud infrastructure that you use.

+ Read More

5:50 PM, GMT

6:20 PM, GMT

Stage 2

Panel Discussion

Building Products with LLMs

There are key areas we must be aware of when working with LLMs. High costs and low latency requirements are just the tip of the iceberg. In this panel we will hear about common pitfalls and challenges we must keep in mind when building on top of LLMs.

+ Read More

6:20 PM, GMT

6:30 PM, GMT

Stage 2

Lightning Talk

Linguistically-informed LLMs Perform Better

It’s silly to think of training and using large LANGUAGE models without any sort of input from the study of language itself. Linguistics are not the only field of knowledge that improve LLMs, as they are the intersection of several fields, however, they can help us not only improve current model performance, but also clearly see where future improvements will come.

+ Read More

6:30 PM, GMT

6:40 PM, GMT

Stage 2

Lightning Talk

Navigating Through the Generative AI Landscape

This session provides an overview of the evolving landscape of Generative AI, with a focus on the latest trends and technologies that shape this field. Designed with startups in mind, the talk offers practical insights on how to adapt and leverage these advancements to enhance their products. Attendees will acquire valuable knowledge to navigate the dynamic landscape of Generative AI, enabling them to stay up-to-date and harness untapped potential for the success of their startups.

+ Read More

6:40 PM, GMT

6:50 PM, GMT

Stage 2

Lightning Talk

Beyond the Hype: Monitoring LLMs in Production

Here’s the truth: troubleshooting models based on unstructured data is notoriously difficult. The measures typically used for drift in tabular data do not extend to unstructured data. The general challenge with measuring unstructured data drift is that you need to understand the change in relationships inside the unstructured data itself. In short, you need to understand the data in a deeper way before you can understand drift and performance degradation.

In this presentation, Claire Long will present findings from research on ways to measure vector/embedding drift for image and language models. With lessons learned from testing different approaches (including Euclidean and Cosine distance) across billions of streams and use cases, she will dive into how to detect whether two unstructured language datasets are different — and, if so, how to understand that difference using techniques such as UMAP.

+ Read More

6:50 PM, GMT

7:10 PM, GMT

Stage 2

1:1 networking

Networking Break

Take a moment to randomly match with others in this event by participating in the networking sessions. To access the random introductions click on the match tab in the left sidebar.

+ Read More

7:10 PM, GMT

7:40 PM, GMT

Stage 2

Presentation

Transforming AI Safety & Security: Constructing LLM Guardrails for a Bold and Fearless AI Era with AIShield.GuArdIan

The rapid adoption of large language models (LLMs) is transforming how businesses communicate, learn, and work, prioritizing AI safety and security. This captivating and insightful talk will delve into the challenges and risks associated with LLM adoption and unveil AIShield.GuArdIan – a game-changing technology that enables businesses to leverage ChatGPT-like AI without compromising compliance. AIShield.GuArdIan's unique approach ensures legal, policy, ethical, role-based, and usage-based compliance, allowing companies to harness the power of LLMs safely. Join us on this riveting journey as we reshape the future of AI, empowering industries to unlock the full potential of LLMs securely and responsibly. Don't miss this opportunity to be at the forefront of responsible AI usage – reserve your seat today and take the first step towards a secure AI-powered future!

+ Read More

7:40 PM, GMT

7:50 PM, GMT

Stage 2

Lightning Talk

Lessons Learned Productionising LLMs for Stripe Support

Large Language Models are an especially exciting opportunity for Operations: they excel at answering questions, completing sentences, and summarizing text while requiring ~100x less training data than the previous generation of models.

In this talk Sophie will discuss lessons learned productionising Stripe’s first application of Large Language Modelling - providing answers to user questions for Stripe Support.

+ Read More

7:50 PM, GMT

8:00 PM, GMT

Stage 2

Presentation

Challenges in Providing LLMs as a Service

This lightning talk explores the challenges encountered in offering Large Language Models as a Service. As LLMs are becoming increasingly larger and more proficient, there are certain challenges that arise which need to be addressed to ensure the efficient and reliable delivery of LLMs as a Service. This talk delves into key challenges such as scalability, model optimization, cost-effectiveness, and data privacy.

+ Read More

8:15 PM, GMT

8:25 PM, GMT

Stage 2

Lightning Talk

Benchmarking LLM performance with LangChain Auto-Evaluator

Document Question-Answering is a popular LLM use-case. LangChain makes it easy to assemble LLM components (e.g., models and retrievers) into chains that support question-answering. But, it is not always obvious to (1) evaluate the answer quality and (2) use this evaluation to guide improved QA chain settings (e.g., chunk size, retrieved docs count) or components (e.g., model or retriever choice). We recently released an open source, hosted app to to address these limitations (see blog post here). We have used this to compare performance of various retrieval methods, including Anthropic's 100k context length model (blog post here). This talk will discuss our results and future plans.

+ Read More

8:25 PM, GMT

9:00 PM, GMT

Stage 2

Presentation

Making LLM Inference Affordable

The impressive reasoning abilities of LLMs can be an attractive proposition for many businesses, but using foundational models and APIs can be slow and full of bumpy API latency windows. Self-hosting models can be an attractive alternative, but how do you choose what model to use, and if you have a latency or inference budget, how do you make it fit? We will discuss how pseudo-labeling, knowledge distillation, pruning, and quantization can ensure the highest efficiency possible.

+ Read More

9:00 PM, GMT

9:30 PM, GMT

Stage 2

Presentation

Controlled and Compliant AI Applications

You can’t build robust systems with inconsistent, unstructured text output from LLMs. Moreover, LLM integrations scare corporate lawyers, finance departments, and security professionals due to hallucinations, cost, lack of compliance (e.g., HIPAA), leaked IP/PII, and “injection” vulnerabilities. This talk will cover some practical methodologies for getting consistent, structured output from compliant AI systems. These systems, driven by open access models and various kinds of LLM wrappers, can help you delight customers AND navigate the increasing restrictions on "GPT" models.

+ Read More

9:30 PM, GMT

9:59 PM, GMT

Stage 2

Presentation

Combining LLMs with Knowledge Bases to Prevent Hallucinations

Large Language Models (LLMs) have shown remarkable capabilities in domains such as question-answering and information recall, but every so often, they just make stuff up. In this talk, we'll take a look at “LLM Hallucinations" and explore strategies to keep LLMs grounded and reliable in real-world applications.

We’ll start by walking through an example implementation of an "LLM-powered Support Center" to illustrate the problems caused by hallucinations. Next, I'll demonstrate how leveraging a searchable knowledge base can ensure that the assistant delivers trustworthy responses. We’ll wrap up by exploring the scalability of this approach and its potential impact on the future of AI-driven applications.

+ Read More

LLMs in Production - Part II

Large Language Models in Production Conference

Speakers

Agenda

Sponsors