DARG: Dynamic Evaluation of Large Language Models via Adaptive Reasoning Graph

Neural Information Processing Systems 

The current paradigm of evaluating Large Language Models (LLMs) through static benchmarks comes with significant limitations, such as vulnerability to data contamination and a lack of adaptability to the evolving capabilities of LLMs.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found