MLOps Community
+00:00 GMT
Sign in or Join the community to continue

Exploring the Latency/Throughput & Cost Space for LLM Inference

Posted Oct 09, 2023 | Views 1K
# LLM Inference
# Latency
# Mistral.AI
Share
SPEAKERS
Timothée Lacroix
Timothée Lacroix
Timothée Lacroix
CTO @ Mistral AI

Timothée Lacroix, aged 31, is Chief Technical Officer in charge of technical issues relating to product efficacy and research. A graduate of ENS rue d’Ulm in computer science and holder of a Master's degree in Mathematics Vision Learning from Paris Saclay, he began his career as an engineer at Facebook AI Research in 2015 in New York, where he completed his thesis between 2016 and 2019, in collaboration with École des Ponts, on tensor factorization for recommender systems. He continued his career at Meta, working with Guillaume Lample until 2023, when he co-founded Mistral AI.

+ Read More

Timothée Lacroix, aged 31, is Chief Technical Officer in charge of technical issues relating to product efficacy and research. A graduate of ENS rue d’Ulm in computer science and holder of a Master's degree in Mathematics Vision Learning from Paris Saclay, he began his career as an engineer at Facebook AI Research in 2015 in New York, where he completed his thesis between 2016 and 2019, in collaboration with École des Ponts, on tensor factorization for recommender systems. He continued his career at Meta, working with Guillaume Lample until 2023, when he co-founded Mistral AI.

+ Read More
Demetrios Brinkmann
Demetrios Brinkmann
Demetrios Brinkmann
Chief Happiness Engineer @ MLOps Community

At the moment Demetrios is immersing himself in Machine Learning by interviewing experts from around the world in the weekly MLOps.community meetups. Demetrios is constantly learning and engaging in new activities to get uncomfortable and learn from his mistakes. He tries to bring creativity into every aspect of his life, whether that be analyzing the best paths forward, overcoming obstacles, or building lego houses with his daughter.

+ Read More

At the moment Demetrios is immersing himself in Machine Learning by interviewing experts from around the world in the weekly MLOps.community meetups. Demetrios is constantly learning and engaging in new activities to get uncomfortable and learn from his mistakes. He tries to bring creativity into every aspect of his life, whether that be analyzing the best paths forward, overcoming obstacles, or building lego houses with his daughter.

+ Read More
SUMMARY

Getting the right LLM inference stack means choosing the right model for your task, and running it on the right hardware, with proper inference code. This talk will go through popular inference stacks and set-ups, detailing what makes inference costly. We'll talk about the current generation of open-source models and how to make the best use of them, but we will also touch on features currently missing from the open-source serving stack as well as what the future generations of models will unlock.

+ Read More

Watch More

35:23
Posted Jun 20, 2023 | Views 9.9K
# LLM in Production
# LLMs
# Claypot AI
# Redis.io
# Gantry.io
# Predibase.com
# Humanloop.com
# Anyscale.com
# Zilliz.com
# Arize.com
# Nvidia.com
# TrueFoundry.com
# Premai.io
# Continual.ai
# Argilla.io
# Genesiscloud.com
# Rungalileo.io