AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories

Open in new window