Detecting Data Contamination from Reinforcement Learning Post-training for Large Language Models

Open in new window