Blog

# Language Models

# LLMs

Pretraining: Breaking Down the Modern LLM Training Pipeline

Modern LLMs are defined as much by how they’re trained as by what they learn. This post unpacks the often-overlooked foundations of that process: pretraining—the stage that shapes a model’s core reasoning and knowledge. Starting with ULMFiT’s breakthrough in transfer learning and InstructGPT’s formalized multi-stage pipeline, it explores how pretraining has evolved into a dynamic ecosystem of techniques, from instruction-augmented and multi-phase approaches to continual and reinforcement-based pretraining. Amid the growing complexity and shifting definitions, one truth remains: understanding pretraining is essential to understanding how language models think, reason, and behave.

Abby Morgan · Oct 28th, 2025

All

Médéric Hurier · Oct 21st, 2025

Slides-To-Translate: When IT Says No, Build a $0.04 Solution on Your Lunch Break

When IT blocked every translation tool, Médéric Hurier decided not to wait. In just one lunch break, he built Slides-To-Translate — a fully automated Google Slides translator using Gemini 2.5 Flash, Colab, and Vertex AI — for only $0.04. His quick hack turned a bureaucratic bottleneck into a lightning-fast, secure, and reusable solution that proves anyone with a bit of code and curiosity can outpace corporate constraints.

# Generative AI Tools

# Data Sceince

# Programming

# Coding

# Hacking

Axel Mendoza · Oct 14th, 2025

Why I Use Terragrunt Over Terraform/OpenTofu in 2025

Terraform becomes messy at scale—too much duplication, manual setup, and no orchestration. Terragrunt fixes this by automating state management, reducing repetition, and handling dependencies. In 2025, its new Stacks feature enables reusable infrastructure patterns, making it the better choice for multi-environment setups despite a small learning curve.

# DevOps

# IAC

# Terraform

# Terragrunt

# Tool Comparison

Médéric Hurier · Oct 7th, 2025

It’s Not Artificial: Recreating a Conversational Format with Gemini’s Multi-Speaker Text to Speech

Inspired by the French show “C’est pas sorcier,” “It’s Not Artificial” uses Google’s Gemini multi-speaker text-to-speech to recreate its conversational learning style. The project automatically generates full audio episodes — complete with distinct voices, languages, and topics — from simple inputs. What began as a nostalgic experiment now hints at a future where AI-driven conversations make education and training more personal, engaging, and human-like.

# Artificial Intelligence

# Generative AI Tools

# Conversational AI

# Data Science

# Gemini

Vishakha Gupta · Sep 30th, 2025

The Misunderstood World of Knowledge Graphs

Graphs are everywhere, but often misunderstood. This blog busts common myths about knowledge graphs, explains why they’re faster and more flexible than you think, and shows how AI can help build them.

# Knowledge graph and graph databases

# Dataset Preparation and Management

# Multimodal/Generative AI

Médéric Hurier · Sep 23rd, 2025

Vibe Youtubing with NotebookLM: The MLOps Coding Course Gets a Video Upgrade in Under 48 Hours

Using NotebookLM’s Video Overview, I turned my MLOps Coding Course from text into a full video series in just two days. What once felt like a month-long grind became a fast and creative process — demonstrating how AI can amplify expertise instead of replacing it.

# Generative AI

# Machine Learning

# MLOps

# Artificial Intelligence

# Data Science

Mats Eikeland Mollestad · Sep 16th, 2025

Smoke Testing for ML Pipelines

Machine learning pipelines are vulnerable to data and infrastructure errors that can disrupt production. By implementing smoke tests with both random and controlled synthetic data, teams can validate pipeline functionality and schema adherence before running full-scale jobs. This practice supports continuous integration and delivery, leading to fewer outages and more reliable deployments.

# ML Testing

# CI/CD

# Machine Learning

George Chouliaras, Antonio Castelli & Zeno Belligoli · Sep 9th, 2025

LLM Evaluation: Practical Tips at Booking.com

We share a pragmatic framework for evaluating LLM-powered applications in production. Anchored in high-quality human labels and a calibrated ‘LLM-as-judge’ approach, it turns subjective outputs into consistent, actionable metrics—enabling continuous monitoring, faster iteration, and safer launches at scale. We distill lessons from a year of building and operating this framework at Booking.com, with the aim to make evaluation a core practice in the GenAI development lifecycle.

# Gen AI

# Evaluation

# LLMs

# LLM Evaluation

Subham Kundu · Sep 2nd, 2025

Securing AI Agents: The Future of MCP Authentication & Authorization

As AI agents like Claude and Cursor integrate into enterprise workflows, organizations face critical security challenges around safe resource access. The Model Context Protocol (MCP) is establishing communication standards, while OAuth 2.1 and token exchange mechanisms provide authentication frameworks to balance AI capabilities with enterprise security requirements for sensitive corporate data.

# AI Agents

# MCP

# AI Security

# Machine Learning

Médéric Hurier · Aug 26th, 2025

Happy Birthday XP: Celebrating Gemini Deep Think (and My Daughter’s 6th Birthday)

In this article, Médéric Hurier tests three versions of Google's Gemini 2.5 models—Flash, Pro, and Deep Think—by challenging them to create a complex, multi-scene interactive birthday experience for his daughter. The experiment revealed an exponential gap in capability, with the advanced Gemini Deep Think model delivering a delightful, polished, and fully functional result that surpassed the other models and captivated his daughter.

# Machine Learning

# MLOps

# AI

# Gemini Deep Think

Vishakha Gupta · Aug 19th, 2025

What Does Multimodality Truly Mean For AI?

From enterprise search to agentic workflows, the ability to reason across text, images, video, audio, and structured data is no longer a futuristic ideal: It’s the new baseline. AI solutions have come a long way in that journey, but until we embrace the need for rethinking how we deal with data, let go of patchwork solutions, and give it a holistic approach, we will keep slowing down our own progress.

# AI Agents

# Multimodal/Generative AI

# Knowledge graph and graph databases

# RAG

# Vector / Similarity / Semantic Search

Blog

.css-1t9010w-StyledLink:hover *{color:var(--theme-color-primary, #C92C7F);}Pretraining: Breaking Down the Modern LLM Training Pipeline

Pretraining: Breaking Down the Modern LLM Training Pipeline