Agent Evaluation & Benchmarking Summit

Scale AI

Wednesday, July 22, 2026San Francisco, CA$199250 attendees

About this event

Evaluating agents is one of the hardest problems in the space. This one-day summit brings together the teams building evaluation frameworks — from task-specific benchmarks to end-to-end reliability metrics.

Hear from the creators of SWE-bench, GAIA, and AgentBench on what they've learned. Workshop sessions cover building custom evals, A/B testing agent behaviors, and measuring safety properties. If you're shipping agents to production, this is how you know they work.

Don't miss this event

Hosting an AI agent event? Submit it for free or get featured placement for $300.

Stay in the know

Agent Evaluation & Benchmarking Summit

About this event

Don't miss this event