Spilling the Beans: Teaching LLMs to Self-Report Their Hidden Objectives

Open in new window