MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark
–Neural Information Processing Systems
However, as models continue to improve, their performance on these benchmarks has begun to plateau, making it increasingly difficult to discern differences in model capabilities.
Neural Information Processing Systems
Feb-17-2026, 09:48:02 GMT