Hard2Verify: A Step-Level Verification Benchmark for Open-Ended Frontier Math

Open in new window