Goto

Collaborating Authors

 puzzle








Easy2Hard-Bench: Standardized Difficulty Labels for Profiling LLM Performance and Generalization

Neural Information Processing Systems

Aiming to address this limitation, we present Easy2Hard-Bench, a consistently formatted collection of 6 benchmark datasets spanning various domains, such as mathematics and programming problems, chess puzzles, and reasoning questions.