MLOps Community
+00:00 GMT
LIVESTREAM
Boosting LLMs: Performance, Scaling, and Structured Outputs
# LLMs
# Performance and Scalability
# SAS

Join us for an in-depth exploration of advanced techniques to enhance the performance and scalability of large language models (LLMs). This event features three insightful sessions:

Learn how linguistic rule-based models and traditional text analytics improve LLM accuracy and efficiency in real-world applications like public comment analysis, focusing on data curation, fine-tuning, and prompt-tuning.

Discover strategies for scaling LLMs in high-traffic environments, including the limitations of inference, cost optimization, and the trade-offs of implementing guardrails.

Explore BAML, a new programming language that enhances LLM output quality by optimizing algorithms and enabling cost-effective, smaller models to perform at top-tier levels.


Speakers
Ben Epstein
Ben Epstein
Founding Software Engineer @ Galileo
Tom Sabo
Tom Sabo
Advisory Solutions Architect @ SAS
Matt Squire
Matt Squire
CTO and Co-founder @ Fuzzy Labs
Vaibhav Gupta
Vaibhav Gupta
CEO @ Boundary ML
Agenda
4:00 PM, GMT
-
4:05 PM, GMT
Opening / Closing
Introduction
Ben Epstein
4:05 PM, GMT
-
4:25 PM, GMT
Presentation
Bending the Rules: How to Use Information Extraction Models to Improve the Performance of Large Language Models

Generative AI and Large Language Models (LLMs) are revolutionizing technology and redefining what's possible with AI. Harnessing the power of these transformative technologies requires careful curation of data to perform in both cost-effective and accurate ways. Information extraction models including linguistic rules and other traditional text analytics approaches can be used to curate data and aid in training, fine-tuning, and prompt-tuning, as well as evaluating the results generated by LLMs. By combining linguistic rule-based models with LLMs through this multi-modal approach to AI, we can help to improve the quality and accuracy of LLMs and enable them to perform better on various tasks while cutting costs. We will demonstrate this innovation with a real-world example in public comment analysis.

+ Read More
Tom Sabo
4:25 PM, GMT
-
4:40 PM, GMT
Presentation
Scaling Large Language Models in Production

Open source models have made running your own LLM accessible many people. It's pretty straightforward to set up a model like Mistral, with a vector database, and build your own RAG application.

But making it scale to high traffic demands is another story. LLM inference itself is slow, and GPUs are expensive, so we can't simply throw hardware at the problem. Once you add things like guardrails to your application, latencies compound.

+ Read More
Matt Squire
4:40 PM, GMT
-
4:55 PM, GMT
Presentation
BAML: Beating OpenAI's Structured Outputs

We created a new programming language that allows us to help developers using LLMs get higher quality results out of any model. For example, in many scenarios, we can match GPT-4o performance with GPT-4o-mini using BAML. We'll discuss some of the algorithms that BAML uses, how they improve the accuracy of models, and why function calling is good and bad.

+ Read More
Vaibhav Gupta
4:55 PM, GMT
-
5:00 PM, GMT
Opening / Closing
Q&A and Wrap up
Ben Epstein
Tom Sabo
Matt Squire
Vaibhav Gupta
Event has finished
September 25, 4:00 PM, GMT
Online
Organized by
MLOps Community
MLOps Community
SAS
SAS
Event has finished
September 25, 4:00 PM, GMT
Online
Organized by
MLOps Community
MLOps Community
SAS
SAS