Probing and Steering Evaluation Awareness of Language Models

Open in new window