LIVESTREAM

Productionalizing AI: Driving Innovation with Cost-Effective Strategies

# Efficient AI

# Cost-Effective Strategies

# AWS

Successfully deploying AI applications into production requires a strategic approach that prioritizes cost efficiency without compromising performance. In this one-hour mini-summit, we'll explore how to optimize costs across the key elements of AI development and deployment. Discover how AWS AI chips, Trainium and Inferentia, offer high-performance, cost-effective compute solutions for training and deploying foundation models.

Learn how Outerbounds' platform streamlines AI workflows and makes the most of underlying compute resources, ensuring efficient and cost-effective development.

Finally, explore the importance of domain adaptation and small language models in achieving high accuracy while reducing costs.

Join us to gain insights into the latest advancements in cost-efficient AI production and learn how to drive innovation while minimizing expenses.

Speakers

Ben Epstein

Founding Software Engineer @ Galileo

Eddie Mattia

Data Scientist @ Outerbounds

Julien Simon

Chief Evangelist @ Arcee AI

Scott Perry

Principal Solutions Architect, Annapurna ML @ AWS

Agenda

4:00 PM

4:05 PM

GMT

Opening / Closing

Introduction

4:05 PM

4:20 PM

GMT

Presentation

Scaling Up, Down, and All Around the AI Stack

The way companies build AI is at an inflection point. Over the last decade, millions of developers and organizations have cultivated the knowledge and skills to build AI systems. Still, on the other hand, many of the most exciting models of late exist behind APIs other companies control with unappetizing cost and data privacy profiles. In this talk, we will explore how the AI/ML infrastructure stack is being affected by vendor APIs and the increasing desire and capacity for reasonable-scale companies to build powerful models in-house.

+ Read More

4:20 PM

4:35 PM

GMT

Presentation

Tailoring Small Language Models for Enterprise Use Cases

After the initial excitement caused by the launch of closed large language models (LLMs), many organizations struggled to reach the quality and ROI targets required to deliver production-grade AI projects. Fortunately, the fantastic pace of innovation of the open-source community quickly made it possible to match and even exceed the accuracy of the best closed LLMs with nimble and cost-effective small language models (SLMs). This session will discuss the latest techniques to tailor SLMs to specific domains and company knowledge, reassuring the audience about the feasibility of real-life AI projects. You will learn about the end-to-end model adaptation process with continuous pre-training, model merging, instruction fine-tuning, quantization, and inference.

+ Read More

4:41 PM

4:50 PM

GMT

Presentation

Enabling High Performance GenAI Workloads with AWS-designed ML Chips

In this talk, we will walk through the history of AWS-designed ML chips, the hardware & software stacks, and some of the most popular integrations and use cases that our customers care about. We will also look at recent case studies to see how AWS customers have derived amazing price performance for their GenAI workloads using AWS Trainium & Inferentia.

+ Read More