Preventing Language Models From Hiding Their Reasoning

Open in new window