Collections
All Collections
All Content
Popular topics
# LLMs
# LLM in Production
# AI
# LLM
# Machine Learning
# Rungalileo.io
# MLops
# RAG
# MLOps
# Interview
# Machine learning
# Tecton.ai
# Arize.com
# Generative AI
# mckinsey.com/quantumblack
# Redis.io
# Zilliz.com
# Humanloop.com
# Snorkel.ai
# Redis.com


Luca Fiaschi & Demetrios Brinkmann · Apr 15th, 2025
Traditional product development cycles require extensive consumer research and market testing, resulting in lengthy development timelines and significant resource investment. We've transformed this process by building a distributed multi-agent system that enables parallel quantitative evaluation of hundreds of product concepts. Our system combines three key components: an Agentic innovation lab generating high-quality product concepts, synthetic consumer panels using fine-tuned foundational models validated against historical data, and an evaluation framework that correlates with real-world testing outcomes. We can talk about how this architecture enables rapid concept discovery and digital experimentation, delivering insights into product success probability before development begins. Through case studies and technical deep-dives, you'll learn how we built an AI powered innovation lab that compresses months of product development and testing into minutes - without sacrificing the accuracy of insights.
# Gen AI
# AI Agents
# PyMC Labs

Shwetank Kumar · Apr 15th, 2025
AI is heading for an energy crisis, with data centers projected to consume as much electricity as France by 2027. Big Tech's current solution—building more power plants—is unsustainable. Real solutions lie in energy-efficient computing (like in-memory and analog) and shifting AI to edge devices. Without these, AI’s progress risks being bottlenecked by electricity limits.
# Energy Crisis
# Edge AI
# Climate Change


Josh Xi & Demetrios Brinkmann · Apr 11th, 2025
In real-time forecasting (e.g. geohash level demand and supply forecast for an entire region), time series-based forecasting methods are widely adopted due to their simplicity and ease of training. This discussion explores how Lyft uses time series forecasting to respond to real-time market dynamics, covering practical tips and tricks for implementing these methods, an in-depth look at their adaptability for online re-training, and discussions on their interpretability and user intervention capabilities. By examining these topics, listeners will understand how time series forecasting can outperform DNNs, and how to effectively use time series forecasting for dynamic market conditions and decision-making applications.
# Time Series
# DNNs
# Lyft



Sophia Skowronski, Adam Becker & Valdimar Eggertsson · Apr 9th, 2025
We break down key insights from the paper, discuss what these findings mean for AI’s role in the workforce, and debate its broader implications. As always, our expert moderators guide the session, followed by an open, lively discussion where you can share your thoughts, ask questions, and challenge ideas with fellow MLOps enthusiasts.
# Generative AI
# Claude
# Hierarchical Taxonomy


Tanmay Chopra & Demetrios Brinkmann · Apr 8th, 2025
Finetuning is dead. Finetuning is only for style. We've all heard these claims. But the truth is we feel this way because all we've been doing is extended pretraining. I'm excited to chat about what real finetuning looks like - modifying output heads, loss functions and model layers, and it's implications on quality and latency. Happy to dive deeper into how DeepSeek leveraged this real version of finetuning through GRPO and how this is nothing more than a rediscovery of our old finetuning ways. I'm sure we'll naturally also dive into when developing and deploying your specialized models makes sense and the challenges you face when doing so.
# Finetuning
# DeepSeek
# Emissary

Rafał Siwek · Apr 7th, 2025
This third article in the series on Distributed MLOps explores overcoming vendor lock-in by unifying AMD and NVIDIA GPUs in mixed clusters for distributed PyTorch training, all without requiring code rewrites:
Mixing GPU Vendors: It demonstrates how to combine AWS g4ad (AMD) and g4dn (NVIDIA) instances, bridging ROCm and CUDA to avoid being tied to a single vendor.
High-Performance Communication: It highlights the use of UCC and UCX to enable efficient operations like all_reduce and all_gather, ensuring smooth and synchronized training across diverse GPUs.
Kubernetes Made Simple: How Kubernetes, enhanced by Volcano for gang scheduling, can orchestrate these workloads on heterogeneous GPU setups.
Real-World Trade-Offs: While covering techniques like dynamic load balancing and gradient compression, it also notes challenges current limitations.
Overall, the piece illustrates how integrating mixed hardware can maximize resource potential, delivering faster, scalable, and cost-effective machine learning training.
# MLOps
# Machine Learning
# Kubernetes
# PyTorch
# AWS


David Cox & Demetrios Brinkmann · Apr 7th, 2025
Shiny new objects are made available to artificial intelligence(AI) practitioners daily. For many who are not AI practitioners, the release of ChatGPT in 2022 was their first contact with modern AI technology. This led to a flurry of funding and excitement around how AI might improve their bottom line. Two years on, the novelty of AI has worn off for many companies but remains a strategic initiative. This strategic nuance has led to two patterns that suggest a maturation of the AI conversation across industries. First, conversations seem to be pivoting from "Are we doing [the shiny new thing]" to serious analysis of the ROI from things built. This reframe places less emphasis on simply adopting new technologies for the sake of doing so and more emphasis on the optimal stack to maximize return relative to cost. Second, conversations are shifting to emphasize market differentiation. That is, anyone can build products that wrap around LLMs. In competitive markets, creating products and functionality that all your competitors can also build is a poor business strategy (unless having a particular thing is industry standard). Creating a competitive advantage requires companies to think strategically about their unique data assets and what they can build that their competitors cannot.
# AI
# LLM
# RethinkFirst


Rohit Agrawal & Demetrios Brinkmann · Apr 4th, 2025
Demetrios talks with Rohit Agrawal, Director of Engineering at Tecton, about the challenges and future of streaming data in ML. Rohit shares his path at Tecton and insights on managing real-time and batch systems. They cover tool fragmentation (Kafka, Flink, etc.), infrastructure costs, managed services, and trends like using S3 for storage and Iceberg as the GitHub for data. The episode wraps with thoughts on BYOC solutions and evolving data architectures.
# Batch Systems
# Cost Management
# Streaming Ecosystem
# Tecton

Kilian Lieret · Apr 2nd, 2025
As language models have advanced, they have moved beyond code completion and are beginning to tackle software engineering tasks in a more autonomous, agentic way. However, evaluating agentic capabilities is challenging. To address this, we first introduce SWE-bench, a benchmark built from real GitHub issues that has become the standard for assessing AI’s ability to resolve complex software tasks in large codebases. We will discuss the current state of the field, the limitations of today’s models, and how far we still are from truly autonomous AI developers. Next, we will explore the fundamentals of agents based on SWE-agent, a simple yet powerful agent framework designed for software engineering but adaptable to a variety of domains. By the end of this talk, you will have an understanding of the current frontier of agentic AI in software engineering, the challenges ahead, and various tips and tricks on optimizing AI agents for tool use and iterative problem solving of reasoning-heavy tasks.
# Code Completion
# Autonomous Agents
Like
1

Enrique Ferrao · Apr 2nd, 2025
The open-source AI ecosystem is evolving rapidly, with frequent releases of new models, architectures, and hardware accelerators. While this flexibility drives innovation, it also introduces significant hidden complexities when fine-tuning and deploying AI models. In this talk, we’ll explore the key challenges teams face when updating fine-tuned models, switching between inference engines, and deploying across different GPUs such as the Nvidia A100, L40s, H100, and Intel Gaudi 2. We’ll share real-world examples, including tokenizer issues, multi-GPU fine-tuning hurdles, and API inconsistencies across AI components.
# Open Source AI
# GPU
# Fine Tuning
Like
Comment