Monitoring Reasoning Models for Misbehavior and the Risks of Promoting Obfuscation

Open in new window