LLMs Can Covertly Sandbag on Capability Evaluations Against Chain-of-Thought Monitoring

Open in new window