CLR-Bench: Evaluating Large Language Models in College-level Reasoning

Open in new window