Join us for two days of talking with some of our favorite people at the forefront of using LLMs in the wild, and an in-person workshop in San Francisco on how to build and deploy LLM based apps hosted by Anyscale.
There will be over 50 Speakers from Stripe, Meta, Canva, Databricks, Anthropic, Cohere, Redis, Langchain, Chroma, Humanloop and so many more.
This all started after we put together the LLM in-production survey and realized there are still lots of unknowns when dealing with LLMs, especially when dealing with them at scale. We open-sourced all the responses and we decided if no one was going to talk about working with LLMs in a non-over-hyped way, we would have to.
Let's discover how to use these damn probabilistic models in the best ways possible without sacrificing the necessary software design building blocks.
Expect all the fun and learnings from the first one. DOUBLED.
And remember, there will be some sweeeet sweet swag giveaways.
Huge Shoutout to all our sponsors of this event, find more info about them below.
Plus a little LLM in production survey report tl;dr summary
Large language models are fluent text generators, but they often make errors, which makes them difficult to deploy in high-stakes applications. Using them in more complicated pipelines, such as retrieval pipelines or agents, exacerbates the problem. In this talk, Matei will cover emerging techniques in the field of “LLMOps” — how to build, tune and maintain LLM-based applications with high quality. The simplest tools are ones to test and visualize LLM results, some of which are now being incorporated into MLOps frameworks like MLflow. However, there are also rich techniques emerging to “program” LLM pipelines and control LLMs’ outputs to achieve desired goals.
Matei will discuss Demonstrate-Search-Predict (DSP) from my group as an example programming framework that can automatically improve an LLM-based application based on feedback, and other open-source tools for controlling outputs and generating better training and evaluation data for LLMs. This talk is based on their experience deploying LLMs in many applications at Databricks, including the QA bot on our public website, internal QA bots, code assistants, and others, all of which are making their way into our MLOps products and MLflow.
What do we need to be aware of when building for production? In this talk, we will explore the key challenges that arise when taking an LLM to production.
Language models are very complex thus introducing several challenges in interpretability. The large amounts of data required to train these black-box language models make it even harder to understand why a language model generates a particular output. In the past, transformer models were typically evaluated using perplexity, BLEU score or human evaluation.
However, LLMs amplify the problem even further due to their generative nature thus making them further susceptible to hallucinations and factual inaccuracies. Thus, evaluation becomes an important concern.
Bring your prompts to the chat cause we will be improvising songs from the audience's suggestions!
In the last LLM in Production event, I spoke on some of the ways we've seen people use a vector database for large language models. This included use cases like information/context retrieval, conversational memory for chatbots, and semantic caching.
These are great and make for flashy demos, however, using this in production isn't trivial. Often times, the less flashy side of these use cases can present huge challenges such as: Advice on prompts? How do I chunk up text? What if I need HIPAA compliance? On-premise? What if I change my embeddings model? What index type? How do I do A/B tests? Which cloud platform or model API should I use? Deployment strategies? How can I inject features from my feature platform? Langchain or LlamaIndex or RelevanceAI???
This talk will detail a distillation of a year+ worth of deploying Redis for these use cases for customers and distill it down into 20 minutes.
As Foundation Models (FMs) continue to grow in size, innovations continue to push the boundaries of what these models can do on language and image tasks. This talk will describe our work on applying foundation models to structured data tasks like data linkage, cleaning and querying. We will then discuss challenges and solutions that these models present for production deployment in the modern data stack.
How to use a Large Language Model (LLM) to create memes? We’ll discuss the unique dataset of ImgFlip, the selection, and fine-tuning of a commercially usable LLM, and associated challenges. Of course, we’ll also demonstrate the model prototype itself. We will also discuss the challenges we anticipate facing with the productionization of an LLM that is used by millions of users.
Autonomous AI agents have gotten a lot of attention recently, but they're mostly just toys. What are the primitives that we need to build more reliable agents, and what are the main business use cases that agentic automation will enable over the next few years?
Humanloop have now seen hundreds of companies go on the journey from playground to production. In this talk we’ll share case-studies of what has and hasn’t worked. We’ll share what the common pitfalls are, emerging best practices and suggestions for how to plan in such a quickly evolving space.
Put down the screen for a moment, close your eyes and bliss out in between the sessions
While we've seen great progress on Open Source LLMs, we haven't seen the same level of progress on systems to serve those LLMs in production contexts. In this presentation, I work through some of the challenges of taking open source models and serving them in production.
Large Language Models require a new set of tools... or do they? K8s is a beast and we like it that way. How can we best leverage all the battle hardened tech that k8s has to offer to make sure that our LLMs go brrrrrrr. Lets talk about it in this chat.
Retrieval augmented generation with embeddings and LLMs has become an important workflow for AI applications.
While embedding-based retrieval is very powerful for applications like 'chat with my documents', users and developers should be aware of key limitations, and techniques to mitigate them.
You think you got prompting skills? been reading too many Reddit threads thinking you can crack the code? Well, let's see what you are capable of!
Generalized models solve general problems. The real value comes from training a large language model (LLM) on your own data and finetuning it to deliver on your specific ML task.Now you can build your own custom LLM, trained on your data and fine-tuned for your generative or predictive task in ten lines of code with Predibase and Ludwig, the low-code deep learning framework developed and open sourced by Uber, now maintained as part of the Linux Foundation. Using Ludwig’s declarative approach to model customization, you can take a pre-trained large language model like LLaMA and tune it to output data specific to your organization, with outputs conforming to an exact schema. This makes building LLMs fast, easy, and economical.In this session, Travis Addair, CTO of Predibase and co-maintainer of open-source Ludwig, will share how LLMs can be tailored to solve specific tasks from classification to content generation, and how you can get started building a custom LLM in just a few lines of code.
Copilots embedded within SaaS applications have become one of the dominant ways of leveraging LLMs within products. In this lightning talk, I’ll review some of the dominant UI paradigms and features, general design patterns and system architectures, and top challenges and future frontiers of production copilot systems.
Many researchers have recently proposed different approaches to building recommender systems using LLMs. These methods convert different recommendation tasks into either language understanding or language generation templates. This talk highlights some of the recent work done on this theme.
This would be a talk on the learning on building Code Suggestions, my team has takes in reference to Model, ML Infra, Evaluation to Compute and Cost.