ZeroSumEval: Scaling LLM Evaluation with Inter-Model Competition