ConStat: Performance-Based Contamination Detection in Large Language Models Jasper Dekoninck 1, Mark Niklas Müller 1,2, Martin Vechev 1 Department of Computer Science 1

Neural Information Processing Systems 

Public benchmarks play an essential role in the evaluation of large language models.