Coding Agents Lunch & Learn, Session 6

Coding Agents Lunch & Learn - Session 6 Community Benchmarking for AI Coding Agents

In this session, we’ll explore ideas for building a community-driven benchmark for AI coding agents. The goal is to test how different LLMs and agent setups perform when solving the same tasks using shared prompts and tools.

We’ll discuss the concept of agent harnesses, how they enable consistent testing across frameworks, and how the community could contribute benchmark examples through a shared repository.

We’ll also dive into hooks, how they work, how to use them effectively, and how they can enhance agent workflows. As part of this, we’ll open the floor for participants to share creative and practical ways they’re using hooks in their own setups.

Finally, we’ll begin drafting a few example benchmark tasks together during the session and discuss how this could evolve into a collaborative LLMOps benchmark dataset for evaluating coding agents.