Dynamic Benchmarking of Reasoning Capabilities in Code Large Language Models Under Data Contamination