MixEval: Deriving Wisdom of the Crowd from LLM Benchmark Mixtures

Open in new window