Establishing Best Practices in Building Rigorous Agentic Benchmarks

Neural Information Processing Systems 

Benchmarks are essential for quantitatively tracking progress in AI. As AI agents become increasingly capable, researchers and practitioners have introduced agentic benchmarks to evaluate agents on complex, real-world tasks.