MLOps Community
+00:00 GMT
Sign in or Join the community to continue

Evaluation of Agentic System // Aditya Gautam // Agent Hour

Posted Apr 22, 2025 | Views 15
# Agents
Share

speaker

avatar
Aditya Gautam
Machine Learning Lead @ Meta

Aditya is a seasoned Machine learning practitioner, currently leading the foundational integrity efforts for Llama models. He led several LLM applications to enhance Facebook recommendation and ranking algorithms at scale. Some of his contributions in Reels include user interest exploration, trend detection, quality improvement and safeguarding policies by detection violation and mitigating misinformation. He holds a master’s degree from Carnegie Mellon University, has worked in Machine learning at Google and has been a founding engineer of an AI startup at Area 120 (Google Incubator). Aditya has been quite active in the Generative AI community and is actively contributing through different speaking, panel and research engagements. He is an active speaker and shares his expertise and work at prominent conferences and summits, including the AI in Production Conference , AI Agent Conference , Generative AI Summit , and the Databricks Data + AI Summit , among others.

+ Read More

SUMMARY

As complex AI agents become common, standard evaluation isn't enough. This presentation provides a structured overview of the critical field of agentic system evaluation. We will briefly explore common single and multi-agent patterns, delve into the fundamental reasons why rigorous evaluation is necessary, and outline core principles for conducting meaningful assessments. This talk covers essential principles, methods (benchmarks, simulation, human feedback), and metrics for evaluating agentic system performance, highlighting key challenges.

+ Read More
1
Comments (0)
Popular
avatar


Watch More

Exploring the Impact of Agentic Workflows
Posted Oct 15, 2024 | Views 7.8K
# AI agents in production
# LLMs
# AI
Why we built PydanticAI, and why you might care // Samuel Colvin // Agent Hour #2
Posted Dec 19, 2024 | Views 3.8K
# Pydantic
# Agents
# Agent Hour
# AI agents in production
Scalable Evaluation and Serving of Open Source LLMs
Posted Jun 20, 2023 | Views 733
# LLM in Production
# Scalable Evaluation
# Anyscale.com
# Redis.io
# Gantry.io
# Predibase.com
# Humanloop.com
# Zilliz.com
# Arize.com
# Nvidia.com
# TrueFoundry.com
# Premai.io
# Continual.ai
# Argilla.io
# Genesiscloud.com
# Rungalileo.io