DebateBench: A Challenging Long Context Reasoning Benchmark For Large Language Models

Open in new window