Auditing language models for hidden objectives

Open in new window