DebateBench: A Challenging Long Context Reasoning Benchmark For Large Language Models