Collections

All Content

All

Mats Eikeland Mollestad · Sep 16th, 2025

Machine learning pipelines are vulnerable to data and infrastructure errors that can disrupt production. By implementing smoke tests with both random and controlled synthetic data, teams can validate pipeline functionality and schema adherence before running full-scale jobs. This practice supports continuous integration and delivery, leading to fewer outages and more reliable deployments.

# ML Testing

# CI/CD

# Machine Learning

Hudson Buzby & Demetrios Brinkmann · Sep 12th, 2025

Trust at Scale: Security and Governance for Open Source Models

For better or for worse, machine learning has traditionally escaped the gaze of security and infrastructure teams, operating outside traditional DevOps practices and not always adhering to organizations' development or security standards. With the introduction of open source catalogs like HuggingFace and Ollama, a new standard has been established for locating, identifying, and deploying machine learning and AI models. But with this new standard comes a plethora of security, governance, and legal challenges that organizations need to address before they can comfortably allow developers to freely build and deploy ML/AI applications. In this conversation will discuss ways that enterprise scale organizations are addressing these challenges to safely and securely build these development environments.

# Generative AI

# Security and Governance

# JFrog

59:23

Elma O'Sullivan-Greene · Sep 11th, 2025

Building Better Agents, Fast: For Accounting

Building better agents fast: real stories, lean workflows, and practical tips for building trustworthy, human-friendly agents in accounting and beyond.

# AI Agents

# Biomedical Models

# MyOB

39:07

George Chouliaras, Antonio Castelli & Zeno Belligoli · Sep 9th, 2025

LLM Evaluation: Practical Tips at Booking.com

We share a pragmatic framework for evaluating LLM-powered applications in production. Anchored in high-quality human labels and a calibrated ‘LLM-as-judge’ approach, it turns subjective outputs into consistent, actionable metrics—enabling continuous monitoring, faster iteration, and safer launches at scale. We distill lessons from a year of building and operating this framework at Booking.com, with the aim to make evaluation a core practice in the GenAI development lifecycle.

# Gen AI

# Evaluation

# LLMs

# LLM Evaluation

Nishikant Dhanuka & Demetrios Brinkmann · Sep 5th, 2025

Before Building AI Agents Watch This (Deep Agent Expertise)

Nishikant Dhanuka talks about what it really takes to make AI agents useful—especially in e-commerce and productivity. From making them smarter with context (like user history and real-time data) to mixing chat and UI for smoother interactions, he breaks down what’s working and what’s not. He also shares why evals matter, how to test with real users, and why AI only succeeds when it actually makes life easier, not more complicated.

# Context Engineering

# AI Engineering

# Prosus Group

52:37

Subham Kundu · Sep 2nd, 2025

Securing AI Agents: The Future of MCP Authentication & Authorization

As AI agents like Claude and Cursor integrate into enterprise workflows, organizations face critical security challenges around safe resource access. The Model Context Protocol (MCP) is establishing communication standards, while OAuth 2.1 and token exchange mechanisms provide authentication frameworks to balance AI capabilities with enterprise security requirements for sensitive corporate data.

# AI Agents

# MCP

# AI Security

# Machine Learning

Joel Horwitz & Demetrios Brinkmann · Sep 1st, 2025

The Era of AI Agents in Marketing

We’re entering a new era in marketing—one powered by AI agents, not just analysts. The rise of tools like Clay, Karrot.ai, 6sense, and Mutiny is reshaping how go-to-market (GTM) teams operate, making room for a new kind of operator: the GTM engineer. This hybrid role blends technical fluency with growth strategy, leveraging APIs, automation, and AI to orchestrate hyper-personalized, scalable campaigns. No longer just marketers, today’s GTM teams are builders—connecting data, deploying agents, and fine-tuning workflows in real time to meet buyers where they are. This shift isn’t just evolution—it’s a replatforming of the entire GTM function.

# Agentic AI

# AI Agents

# Neoteric3D

48:57

Kelly Hong, Adam Becker, Matt Squire & 2 more speakers · Sep 1st, 2025

Context Rot: How Increasing Input Tokens Impacts LLM Performance (MLOps Community Reading Group)

When Bigger Isn’t Always Better: How Context Length Can Break Your LLM Longer context windows are the new bragging rights in LLMs — now stretching into the millions of tokens. But can models really handle the first and the 10,000th token equally well?

# Context Windows

# LLMs

# Prompt Engineering

1:00:29

Sonam Gupta, Adam Becker, Nehil Jain & 1 more speaker · Sep 1st, 2025

Small Language Models are the Future of Agentic AI Reading Group

This paper challenges the LLM-dominant narrative and makes the case that small language models (SLMs) are not only sufficient for many agentic AI tasks—they’re often better. 🧠 As agentic AI systems become more common—handling repetitive, task-specific operations—giant models may be overkill. The authors argue that: SLMs are faster, cheaper, and easier to deploy Most agentic tasks don't require broad general intelligence SLMs can be specialized and scaled with greater control Heterogeneous agents (using both LLMs and SLMs) offer the best of both worlds They even propose an LLM-to-SLM conversion framework, paving the way for more efficient agent design.

# Small Language Models

# Agentic AI

# LLMs

58:13

Nikolaos Vasiloglou & Demetrios Brinkmann · Aug 27th, 2025

Distilling 200+ Hours of NeurIPS: What’s Next for AI

Nikolaos widely shared analysis on LinkedIn highlighted key insights across agentic AI, scaling laws, LLM development, and more. Now, he’s exploring how AI itself might be trained to automate this process in the future - offering a glimpse into how researchers could harness LLMs to synthesize conferences like NeurIPS in real time.

# NeurIPS

# Deep Learning

# RelationalAI

57:36

Collections

All Collections

All Content