Easy2Hard-Bench: Standardized Difficulty Labels for Profiling LLM Performance and Generalization

Open in new window