Blog
# AI Cost Optimization
# LLM Tooling
# Semantic Search
# Vector Databases
# Redis
# Embeddings
How I Reduced AI Token Costs by 91% with Semantic Tool Selection and Redis
A system with 70+ automated tools was sending all tool definitions with every query, wasting tokens and slowing responses. The solution was a semantic tool selection system using Redis as a vector database with intelligent embeddings to understand user intent. This approach cut token consumption by 91% while improving accuracy by matching queries to only the relevant tools needed.

Subham Kundu · Jan 20th, 2026

Médéric Hurier · Jan 13th, 2026
Generative AI has evolved at lightning speed from LLMs and RAG to autonomous AI agents capable of reasoning, planning, and acting. But creating a single agent is easy; managing thousands in an enterprise requires a full AI Agent Platform. This guide breaks down the architecture of a production-grade platform, covering layers like Interaction, Development, Core, Foundation, Information, Observability, and Trust. It shows how to build systems that are secure, scalable, and capable of delivering real business impact.
# AI Agents
# Artificial Intelligence
# Data Science
# Software Architecture
# Cloud Computing

Haziqa Sajid · Jan 6th, 2026
Learn how a natively multimodal database like ApertureDB helps healthcare ads stay compliant by flagging missing facts and improving transparency by supporting true multimodality alongside vector search.
Longer abstract: Technologies like RAG (retrieval-augmented generation), semantic search systems, and generative applications wouldn’t be possible without vector databases. A very few of these databases, such as ApertureDB, are truly capable of natively handling more than just text. They now work with images, audio, and other data types, which opens up new possibilities across industries like healthcare, retail, and finance.
For building this example, we pick healthcare advertising because it shows a great blend of multimodality. With strict rules around accuracy, disclosure, and patient privacy, it’s critical to include all Material Facts in marketing content. These are details that could influence a patient’s understanding or choices.
In this blog, we will discuss how a combination of ApertureDB, Unstructured, and OpenAI can help detect and flag missing material facts in healthcare advertisements.
# Multimodal/Generative AI
# RAG
# Vector/similarity/semantic search

Vishakha Gupta · Dec 23rd, 2025
As AI applications move beyond rows and columns into images, video, embeddings, and graphs, traditional query languages like SQL and Cypher start to crack. This post explains why ApertureDB chose to design a JSON-based query language from scratch—one built for multimodal search, data processing, and scale. By aligning with how modern AI systems already communicate (JSON, agents, workflows, and natural language), ApertureDB avoids brittle joins, performance tradeoffs, and DIY pipelines, while still offering SQL and SPARQL wrappers for familiarity. The result is a layered, future-proof way to query, process, and explore multimodal data without forcing old abstractions onto new problems.
# Multimodal/Generative AI
# Usability and Debugging

Médéric Hurier · Dec 16th, 2025
The traditional centralized data platform, characterized by rigid data warehouses and complex ETL pipelines, creates technical bottlenecks that severely slow down the delivery of business insights, forcing decision-makers to wait for overburdened data engineering teams. The open-source prototype Da2a proposes a radical new paradigm: a distributed, agentic ecosystem where specialized, autonomous agents (e.g., Marketing, E-commerce) manage their own domain data and collaborate via an Agent-to-Agent (A2A) protocol to answer complex, cross-domain queries. Instead of focusing on the engineering of data movement and storage, this approach is insight-focused, allowing an orchestrator agent to plan and delegate tasks, abstracting underlying complexity and enabling greater scalability, extensibility, and alignment with high-level business logic—a critical evolution for MLOps engineers looking to build more flexible and responsive data foundations.
# Generative AI Tools
# Artificial Intelligence
# Machine Learning
# Data Science
# AI Agent

Kopal Garg · Dec 10th, 2025
Everyone obsesses over models, but NVIDIA’s stack makes it obvious: the real power move is owning everything around the model. NeMo trains it, RAPIDS cleans it, TensorRT speeds it up, Triton serves it, Operators manage it — and the hardware seals the deal.
It’s less a toolkit and more a gravity well for your entire GenAI pipeline. Once you’re in, good luck escaping.
# Generative AI
# AI Frameworks
# NVIDIA

Médéric Hurier · Dec 2nd, 2025
Overcome the friction of boilerplate code and infrastructure wrangling by adopting a declarative approach to AI agent development. This article introduces Ackgent, a production-ready template built on Google’s Agent Developer Kit (ADK) and Agent Config, which allows developers to define agent behaviors via structured YAML files while keeping implementation logic in Python. Learn how to leverage a modern stack—including uv, just, and Multi-agent Communication Protocol (MCP)—to rapidly prototype, test, and deploy scalable multi-agent systems on Google Cloud Run.
# AI Agents
# Generative AI Agents
# Artificial Intelligence
# Google ADK
# Data Science

Kopal Garg · Nov 25th, 2025
An end-to-end DenseNet-121 pipeline on the MedNIST dataset was rebuilt using NVIDIA’s GPU-native tools, replacing traditional CPU-based stages like Pillow, OpenCV, and PyTorch DataLoader. The GPU workflow delivered 3.3× higher throughput, ~3× lower latency, better memory efficiency, and higher hardware utilization on a Tesla T4. The post also outlines future gains through TensorRT, INT8 quantization, RAPIDS cuDF, and GPUdirect Storage to push medical imaging pipelines closer to real-time performance.
# Inference Optimization
# NVIDIA
# Medical AI

Médéric Hurier · Nov 18th, 2025
Combo-Banana is an open-source prototype based on Google's Nano Banana, designed to empower product designers by allowing them to quickly define and automate their multi-step image editing workflows. This tool transforms tedious, repetitive tasks—like background removal, color correction, and integration into new scenes—into a fast, consistent, and automated process, freeing designers to focus on creative work.
# Generative AI Tools
# Productivity
# Artificial Intelligence
# Python
# Open Source

Kopal Garg · Nov 12th, 2025
LLMs can perform complex tasks like drafting contracts or answering medical questions, but without safeguards, they pose serious risks—like leaking PII, giving unauthorized advice, or enabling fraud. NVIDIA’s NeMo Guardrails provides a modular safety framework that enforces AI safety through configurable input and output guardrails, covering risks such as PII exposure, jailbreaks, legal liability, and regulatory violations. In high-stakes areas like healthcare, it blocks unauthorized diagnoses and ensures HIPAA/FDA compliance. Each blocked action includes explainable metadata for auditing and transparency, turning AI safety from a black-box filter into configurable, measurable infrastructure.
# LLMs
# NeMo Guardrails
# PII
# HIPAA/FDA

Médéric Hurier · Nov 4th, 2025
Deploying AI agents in enterprises is complex, balancing security, scalability, and usability. This post compares deployment paths on Google Cloud—highlighting Cloud Run with IAP as the most secure and flexible option—and shows how teams can build powerful agents with ADK without losing the human touch.
# AI Agent
# Agentops
# Generative AI Tools
# Data Science
# Artificial Intelligence
