Exploring the Latency/Throughput & Cost Space for LLM Inference

Name: Exploring%20the%20Latency/Throughput%20&%20Cost%20Space%20for%20LLM%20Inference
Uploaded: 2023-10-09T08:31:54.182Z

Posted Oct 09, 2023 | Views 1.4K

# LLM Inference

# Latency

# Mistral.AI

Timothée Lacroix

CTO @ Mistral AI

Timothée Lacroix, aged 31, is Chief Technical Officer in charge of technical issues relating to product efficacy and research. A graduate of ENS rue d’Ulm in computer science and holder of a Master's degree in Mathematics Vision Learning from Paris Saclay, he began his career as an engineer at Facebook AI Research in 2015 in New York, where he completed his thesis between 2016 and 2019, in collaboration with École des Ponts, on tensor factorization for recommender systems. He continued his career at Meta, working with Guillaume Lample until 2023, when he co-founded Mistral AI.

+ Read More

Demetrios Brinkmann

Chief Happiness Engineer @ MLOps Community

At the moment Demetrios is immersing himself in Machine Learning by interviewing experts from around the world in the weekly MLOps.community meetups. Demetrios is constantly learning and engaging in new activities to get uncomfortable and learn from his mistakes. He tries to bring creativity into every aspect of his life, whether that be analyzing the best paths forward, overcoming obstacles, or building lego houses with his daughter.

+ Read More

SUMMARY

Getting the right LLM inference stack means choosing the right model for your task, and running it on the right hardware, with proper inference code. This talk will go through popular inference stacks and set-ups, detailing what makes inference costly. We'll talk about the current generation of open-source models and how to make the best use of them, but we will also touch on features currently missing from the open-source serving stack as well as what the future generations of models will unlock.

+ Read More

Watch More

Making LLM Inference Affordable

Posted Jul 06, 2023 | Views 625

# LLM

# LLM in Production

# Snowflake.com

Exploring the Impact of Agentic Workflows

Posted Oct 15, 2024 | Views 7.8K

# AI agents in production

# LLMs

# AI

Building LLM Applications for Production

Posted Jun 20, 2023 | Views 11.1K

# LLM in Production

# LLMs

# Claypot AI

# Redis.io

# Gantry.io

# Predibase.com

# Humanloop.com

# Anyscale.com

# Zilliz.com

# Arize.com

# Nvidia.com

# TrueFoundry.com

# Premai.io

# Continual.ai

# Argilla.io

# Genesiscloud.com

# Rungalileo.io

Exploring the Latency/Throughput & Cost Space for LLM Inference

speakers

SUMMARY

Watch More