Mind2Web 2: Evaluating Agentic Search with Agent-as-a-Judge

Neural Information Processing Systems 

T propose o address a no the vel challenge Agent-as-a-Judge of evaluating framew time-v ork. Our arying method and construct complex s answers, task-specific we judg answer of ten e a frontier g correctness ents based agentic and on a search source tree-structured systems attribution.