ZeroSumEval: Scaling LLM Evaluation with Inter-Model Competition

Open in new window