Investigating Data Contamination in Modern Benchmarks for Large Language Models