MixEval: Deriving Wisdom of the Crowd from LLM Benchmark Mixtures

Neural Information Processing Systems 

Evaluating large language models (LLMs) is challenging.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found