MLOps Community
+00:00 GMT

Collections

All Collections
Data Engineering for AI/ML
Data Engineering for AI/ML
35 Items
AIQCON SAN FRANCISCO 2024
Blog
MLOps IRL
MLOps Community Podcast
AI in Production
ROUNDtable
MLOps Community Mini Summit
LLMs in Production Conference Part III
LLMs in Production Conference Part II

All Content

Popular topics
# LLM in Production
# LLMs
# AI
# Rungalileo.io
# Machine Learning
# MLops
# LLM
# Interview
# RAG
# Tecton.ai
# Machine learning
# Arize.com
# mckinsey.com/quantumblack
# Redis.io
# Zilliz.com
# Humanloop.com
# Snorkel.ai
# Redis.com
# Wallaroo.ai
# MLOps
All
Korri Jones
Valdimar Eggertsson
Sophia Skowronski
+2
Korri Jones, Valdimar Eggertsson, Sophia Skowronski & 2 more speakers · Nov 5th, 2024
What is the Role of Small Models in the LLM Era: A Survey
October 2024 MLOps reading group session explores the role and relevance of small language models in an era dominated by large language models (LLMs). The author of a recent survey paper on small models joins to discuss motivations for using smaller models, including resource constraints, efficiency, and unique capabilities they bring to certain tasks. Key discussion points include the advantages of small models in specific contexts (e.g., edge devices and specialized tasks), their role in complementing large models, and emerging techniques for leveraging small models to enhance model efficiency and mitigate issues like out-of-vocabulary words. The group also touches on methods for compressing models and the challenges in balancing model size with generalization and task-specific performance.
# LLMs
# Small Language Models
# Specialized Tasks
58:44
Petar  Tsankov
Demetrios Brinkmann
Petar Tsankov & Demetrios Brinkmann · Nov 1st, 2024
Dive into AI risk and compliance. Petar Tsankov, a leader in AI safety, talks about turning complex regulations into clear technical requirements and the importance of benchmarks in AI compliance, especially with the EU AI Act. We explore his work with big AI players and the EU on safer, compliant models, covering topics from multimodal AI to managing AI risks. He also shares insights on COMPL-AI, an open-source tool for checking AI models against EU standards, making compliance simpler for AI developers. A must-listen for those tackling AI regulation and safety.
# EU AI Act
# AI regulation and safety
# LatticeFlow
58:01
Effective collaboration is crucial for building scalable ML and AI solutions in a rapidly evolving data engineering landscape. YouGot.us, in collaboration with MLOps.community, conducted a survey of over 200 participants in September 2024, revealing key challenges and practices in ML and data pipeline development.
# Effective collaboration
# Survey
# MLOps Community
# You.com
Tecton introduced new GenAI capabilities (in private preview) in the 1.0 release of its SDK that makes it much easier to productionize RAG applications. This post shows how the SDK can enable AI teams to use the SDK to build hyper-personalized chatbots via prompt enrichment and automated knowledge assembly.
# LLMs
# Real-Time
# Tecton
Limited memory capacity hinders the performance and potential of research and production environments utilizing Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) techniques. This discussion explores how leveraging industry-standard CXL memory can be configured as a secondary, composable memory tier to alleviate this constraint. We will highlight some recent work we’ve done in integrating this novel class of memory into LLM/RAG/vector database frameworks and workflows. Disaggregated shared memory is envisioned to offer high-performance, low-latency caches for model/pipeline checkpoints of LLM models, KV caches during distributed inferencing, LORA adaptors, and in-process data for heterogeneous CPU/GPU workflows. We expect to showcase these types of use cases in the coming months.
# Memory
# Checkpointing
# MemVerge
55:19
Gideon Mendels
Demetrios Brinkmann
Gideon Mendels & Demetrios Brinkmann · Oct 18th, 2024
When building LLM Applications, Developers need to take a hybrid approach from both ML and SW Engineering best practices. They need to define eval metrics and track their entire experimentation to see what is and is not working. They also need to define comprehensive unit tests for their particular use case so they can confidently check if their LLM App is ready to be deployed.
# LLMs
# Engineering best practices
# Comet ML
1:01:43
In this MLOps Community podcast, Demetrios chats with Raj Rikhy, Principal Product Manager at Microsoft, about deploying AI agents in production. They discuss starting with simple tools, setting clear success criteria, and deploying agents in controlled environments for better scaling. Raj highlights real-time uses like fraud detection and optimizing inference costs with LLMs while stressing human oversight during early deployment to manage LLM randomness. The episode offers practical advice on deploying AI agents thoughtfully and efficiently, avoiding over-engineering and integrating AI into everyday applications.
# AI agents in production
# LLMs
# AI
49:13
Jelmer Borst
Daniela Solis
Demetrios Brinkmann
Jelmer Borst, Daniela Solis & Demetrios Brinkmann · Oct 8th, 2024
Like many companies, Picnic started out with a small, central data science team. As this grows larger, focussing on more complex models, it questions the skillsets & organisational set up. Use an ML platform, or build ourselves? A central team vs. embedded? Hire data scientists vs. ML engineers vs. MLOps engineers How to foster a team culture of end-to-end ownership How to balance short-term & long-term impact
# Recruitment
# Growth
# Picnic
57:50
Francisco Ingham
Demetrios Brinkmann
Francisco Ingham & Demetrios Brinkmann · Oct 4th, 2024
Being an LLM-native is becoming one of the key differentiators among companies, in vastly different verticals. Everyone wants to use LLMs, and everyone wants to be on top of the current tech but - what does it really mean to be LLM-native? LLM-native involves two ends of a spectrum. On the one hand, we have the product or service that the company offers, which surely offers many automation opportunities. LLMs can be applied strategically to scale at a lower cost and offer a better experience for users. But being LLM-native not only involves the company's customers, it also involves each stakeholder involved in the company's operations. How can employees integrate LLMs into their daily workflows? How can we as developers leverage the advancements in the field not only as builders but as adopters? We will tackle these and other key questions for anyone looking to capitalize on the LLM wave, prioritizing real results over the hype.
# LLM-native
# RAG
# Pampa Labs
56:14
Tom Sabo
Matt Squire
Vaibhav Gupta
+1
Tom Sabo, Matt Squire, Vaibhav Gupta & 1 more speaker · Oct 3rd, 2024
Bending the Rules: How to Use Information Extraction Models to Improve the Performance of Large Language Models Generative AI and Large Language Models (LLMs) are revolutionizing technology and redefining what's possible with AI. Harnessing the power of these transformative technologies requires careful curation of data to perform in both cost-effective and accurate ways. Information extraction models including linguistic rules and other traditional text analytics approaches can be used to curate data and aid in training, fine-tuning, and prompt-tuning, as well as evaluating the results generated by LLMs. By combining linguistic rule-based models with LLMs through this multi-modal approach to AI, we can help to improve the quality and accuracy of LLMs and enable them to perform better on various tasks while cutting costs. We will demonstrate this innovation with a real-world example in public comment analysis. Scaling Large Language Models in Production Open source models have made running your own LLM accessible many people. It's pretty straightforward to set up a model like Mistral, with a vector database, and build your own RAG application. But making it scale to high traffic demands is another story. LLM inference itself is slow, and GPUs are expensive, so we can't simply throw hardware at the problem. Once you add things like guardrails to your application, latencies compound. BAML: Beating OpenAI's Structured Outputs We created a new programming language that allows us to help developers using LLMs get higher quality results out of any model. For example, in many scenarios, we can match GPT-4o performance with GPT-4o-mini using BAML. We'll discuss some of the algorithms that BAML uses, how they improve the accuracy of models, and why function calling is good and bad.
# LLMs
# RAG
# BAML
# SAS
1:01:23
Popular