MLOps Community
Sign in or Join the community to continue

Evaluation of Agentic System // Aditya Gautam // Agent Hour

Posted Apr 22, 2025 | Views 88
# Agents
Share

Speaker

user's Avatar
Aditya Gautam
Machine Learning Technical Lead @ Meta

Aditya is a seasoned AI expert and thought leader at the nexus of AI Integrity, recommendation systems, and LLM-powered agents, focusing on building trustworthy, efficient AI at scale. As a Machine Learning Technical Lead for Integrity at Meta, he architects large-scale AI systems to improve ranking algorithms, combat misinformation, and improve user engagement. He previously served as a founding engineer for a Computer Vision startup within Google’s prestigious Area 120 incubator.

Aditya is quite active in Generative AI community with him being a sought after speaker, panelist, and interviewee, frequently sharing novel insights on agentic system evaluation, LLM cost optimization on industry podcasts and at premier summits like the Databricks Data + AI Summit 2025, Marktechpost, AI agent conference, Analytics Vidhya, MLops Community and other. His expertise, particularly on Generative AI and agent misinformation, has been featured in major media articles, including the Daily Herald and Marktechpost. His recent research presented at ICWSM 2025 offers a blueprint for a multi-agent system for the misinformation lifecycle. Dedicated to maintaining high standards, he serves as an Ethics Reviewer for NeurIPS 2025 and reviewer several papers for top-tier conferences like ICML, AAAI, ACM among others.

+ Read More

SUMMARY

As complex AI agents become common, standard evaluation isn't enough. This presentation provides a structured overview of the critical field of agentic system evaluation. We will briefly explore common single and multi-agent patterns, delve into the fundamental reasons why rigorous evaluation is necessary, and outline core principles for conducting meaningful assessments. This talk covers essential principles, methods (benchmarks, simulation, human feedback), and metrics for evaluating agentic system performance, highlighting key challenges.

+ Read More
Comments (0)
Popular
avatar


Watch More

Exploring the Impact of Agentic Workflows
Posted Oct 15, 2024 | Views 7.8K
# AI agents in production
# LLMs
# AI
Why we built PydanticAI, and why you might care // Samuel Colvin // Agent Hour #2
Posted Dec 19, 2024 | Views 3.9K
# Pydantic
# Agents
# Agent Hour
# AI agents in production
Multi-Agent Systems for the Misinformation Lifecycle // Aditya Gautam
Posted Nov 25, 2025 | Views 86
# Agents in Production
# Prosus AI
# Multi-Agent System
Code of Conduct