Large Language Models have taken the world by storm. But what are the real use cases? What are the challenges in productionizing them?
In this event, you will hear from practitioners about how they are dealing with things such as cost optimization, latency requirements, trust of output and debugging.
You will also get the opportunity to join workshops that will teach you how to set up your use cases and skip over all the headaches.
If software 2.0 was about designing data collection for neural network training, software 3.0 is about manipulating foundation models at a system level to create great end-user experiences. AI-native applications are “GPT wrappers” the way SaaS companies are database wrappers. This talk discusses the huge design space for software 3.0 applications and explains Conviction’s framework for value, defensibility and strategy in specifically assessing these companies.
In this talk, Shreya will share a candid look back at a year dedicated to developing reliable AI tools in the open-source community. The talk will explore which tools and techniques have proven effective and which ones have not, providing valuable insights from real-world experiences. Additionally, Shreya will offer predictions on the future of AI tooling, identifying emerging trends and potential breakthroughs. This presentation is designed for anyone interested in the practical aspects of AI development and the evolving landscape of open-source technology, offering both reflections on past lessons and forward-looking perspectives.
This talk will cover how we fine-tuned a model to generate health insurance appeals. If you've ever gotten a health insurance denial & just kind of given up hopefully the topic speaks to you. Even if you have not, come to learn about our adventures in using different cloud resources for fine-tuning and finally an on-prem Kubernetes based deployment in Fremont, CA including when the graphics cards would not fit in the servers.
This session talks about the pivotal role of retrieval evaluation in Language Model (LLM)-based applications like RAG, emphasizing its direct impact on the quality of responses generated. We explore the correlation between retrieval accuracy and answer quality, highlighting the significance of meticulous evaluation methodologies.
It is possible to build KGs with LMs through prompt engineering. But are we boiling the ocean? Can we improve the quality of the generated graph elements by using - dare I say it - SLMs (small language models)?
Over the past few years, we have witnessed a rapid evolution of generative large language models (LLMs), culminating in the creation of unprecedented tools like ChatGPT. Generative AI has now become a popular topic among both researchers and the general public. Now more than ever before, it is important that researchers and engineers (i.e., those building the technology) develop an ability to communicate the nuances of their creations to others. A failure to communicate the technical aspects of AI in an understandable and accessible manner could lead to widespread public scepticism (e.g., research on nuclear energy went down a comparable path) or the enactment of overly-restrictive legislation that hinders forward progress in our field. Within this talk, we will take a small step towards solving these issues by proposing and outlining a simple, three-part framework for understanding and explaining generative LLMs.
Lots of companies are investing time and money in LLMs, some even have customer-facing applications, but what about some common sense? Impact assessment | Risk assessment | Maturity assessment.
As builders, engineers, and creators, we are often thinking about starting the full life-cycle of a machine learning or AI project from gathering data, cleaning the data, and training and evaluating a model. But what about the experiential qualities of an AI product that we want our user to be able to experience on the front end? Join me to learn about the foundational questions I ask myself and my team while building products that incorporate LLMs.
In the rapidly evolving field of natural language processing, the evaluation of Large Language Models (LLMs) has become a critical area of focus. We will explore the importance of a robust evaluation strategy for LLMs and the challenges associated with traditional metrics such as ROUGE and BLEU. We will conclude the talk with some nontraditional such as correctness, faithfulness, and freshness metrics that are becoming increasingly important in the evaluation of LLMs.
We will discuss how can we go from developing a solution to production in context of vision models, exploring fine-tuning LORAs, upscaling pipelines, constraints based generations, and step-by-step improving overall performance & quality for a production ready service.
In this presentation, we navigates the iterative development of Large Language Model (LLM) applications and the intricacies of LLMOps design. We emphasize the importance of anchoring LLM development in practical business use cases and a deep understanding of your own data. Continuous Integration and Continuous Deployment (CI/CD) should be a core component for LLM pipeline deployment, just like in Machine Learning Operations (MLOps). However, the unique challenges posed by LLMs include addressing data security, API governance, the imperative need for GPU infrastructure in inference, integration with external vector databases, and the absence of clear evaluation rubrics. Join us as we illuminate strategies to overcome these challenges and make strategic adaptations. Our journey includes reference architectures for the seamless productionization of RAGs on the Databricks Lakehouse platform.
I'll be talking about the challenges of evaluating language models, as well as how to address them, what metrics you can use, and datasets available. Discuss difficulties of continuous evaluation in production and common pitfalls.
Takeaways: A call to action to contribute to public evaluation datasets and a more concerted effort from the community to reduce harmful bias.
A quick run down of Helix and how it helps you to fine tune text and image AI all using the latest open source models. Kai will discuss some of the issues that cropped up when creating and running a fune tuning as a service platform.
GitHub Copilot based on GPT is truly a game changer when it comes to automating the code generation, thus boosting developer productivity by more than 100%.In this session, you will have a chance to learn what GitHub Copilot is, and you will build a console web app in about 10 minutes with GitHub Copilot!
What old is new again. As we gain more experience in RAG, we're starting to pay more attention to improving retrieval quality. From hybrid search to reranking, RAG pipelines are starting to look more and more like recommender pipelines. In this lightning talk we'll take a brief look at the parallels between the two, and we'll check out how to do hybrid reranking with LanceDB to improve your retrieval quality.
Mihail Eric, one of the community members, is a founder by day and a stand-up comic by night!
A year ago, with the introduction of GPT-4, the sphere of machine learning was transformed completely. These advancements and LLMs unlocked the capability to address previously unsolvable problems but also commoditized machine learning.
In this session, we will delve into the intricacies of the emerging LLMOps Stack, exploring the tools, and best practices that empower organizations to harness the full potential of LLMs.
Large language models (LLMs) can unlock great productivity in software engineering, but it's important to acknowledge their limitations, particularly in generating robust code. This talk, "Accelerate ML Production with Agents," discusses applying the abstraction of LLMs with tools to tackle complex challenges. Agents have the potential to streamline the orchestration of ML workflows and simplify customization and deployment processes.
Large Language Models (LLMs) are revolutionizing how users can search for, interact with, and generate new content. There's been an explosion of interest around Retrieval Augmented Generation (RAG), enabling users to build applications such as chatbots, document search, workflow agents, and conversational assistants using LLMs on their private data.
While setting up naive RAG is straightforward, building production RAG is very challenging. There are parameters and failure points along every stage of the stack that an AI engineer must solve in order to bring their app to production.
This talk will cover the overall landscape of pain points and solutions around building production RAG, and also paint a picture of how this architecture will evolve over time.