EvaLearn Quantifying the Learning Capability and Efficiency of LLMs via Sequential Problem Solving

Jun-22-2026, 05:43:11 GMT–Neural Information Processing Systems

We introduce EvaLearn, a pioneering benchmark designed to evaluate large language models (LLMs) on their learning capability and efficiency in challenging tasks, a critical, yet underexplored aspect of model potential. EvaLearn contains 648 challenging problems across six task types, grouped into 182 sequences, each sequence dedicated to one task type. Diverging from most existing benchmarks that evaluate models in parallel, EvaLearn requires models to solve problems sequentially, allowing them to leverage the experience gained from previous solutions. EvaLearn provides five comprehensive automated metrics to evaluate models and quantify their learning capability and efficiency. We extensively benchmark nine frontier models and observe varied performance profiles: some models, such as Claude-3.7-sonnet,

large language model, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Jun-22-2026, 05:43:11 GMT

Conferences PDF

Add feedback

Country:
- Asia > Middle East (0.45)
- North America > United States
  - Minnesota (0.27)

Genre:
- Research Report > Experimental Study (1.00)
- Overview (1.00)

Industry:
- Education (1.00)
- Health & Medicine > Therapeutic Area (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.95)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found