LIVESTREAM

Boosting LLMs: Performance, Scaling, and Structured Outputs

# LLMs

# Performance and Scalability

# SAS

Join us for an in-depth exploration of advanced techniques to enhance the performance and scalability of large language models (LLMs). This event features three insightful sessions:

Learn how linguistic rule-based models and traditional text analytics improve LLM accuracy and efficiency in real-world applications like public comment analysis, focusing on data curation, fine-tuning, and prompt-tuning.

Discover strategies for scaling LLMs in high-traffic environments, including the limitations of inference, cost optimization, and the trade-offs of implementing guardrails.

Explore BAML, a new programming language that enhances LLM output quality by optimizing algorithms and enabling cost-effective, smaller models to perform at top-tier levels.

Speakers

Ben Epstein

Co-Founder & CTO @ GrottoAI

Tom Sabo

Advisory Solutions Architect @ SAS

Matt Squire

CTO and Co-founder @ Fuzzy Labs

Vaibhav Gupta

CEO @ Boundary ML

Agenda

4:00 PM

4:05 PM

GMT

Opening / Closing

Introduction

4:05 PM

4:25 PM

GMT

Presentation

Bending the Rules: How to Use Information Extraction Models to Improve the Performance of Large Language Models

Generative AI and Large Language Models (LLMs) are revolutionizing technology and redefining what's possible with AI. Harnessing the power of these transformative technologies requires careful curation of data to perform in both cost-effective and accurate ways. Information extraction models including linguistic rules and other traditional text analytics approaches can be used to curate data and aid in training, fine-tuning, and prompt-tuning, as well as evaluating the results generated by LLMs. By combining linguistic rule-based models with LLMs through this multi-modal approach to AI, we can help to improve the quality and accuracy of LLMs and enable them to perform better on various tasks while cutting costs. We will demonstrate this innovation with a real-world example in public comment analysis.

+ Read More

4:25 PM

4:40 PM

GMT

Presentation

Scaling Large Language Models in Production

Open source models have made running your own LLM accessible many people. It's pretty straightforward to set up a model like Mistral, with a vector database, and build your own RAG application.

But making it scale to high traffic demands is another story. LLM inference itself is slow, and GPUs are expensive, so we can't simply throw hardware at the problem. Once you add things like guardrails to your application, latencies compound.

+ Read More

4:40 PM

4:55 PM

GMT

Presentation

BAML: Beating OpenAI's Structured Outputs

We created a new programming language that allows us to help developers using LLMs get higher quality results out of any model. For example, in many scenarios, we can match GPT-4o performance with GPT-4o-mini using BAML. We'll discuss some of the algorithms that BAML uses, how they improve the accuracy of models, and why function calling is good and bad.

+ Read More