Towards Reliable Evaluation of Behavior Steering Interventions in LLMs

Open in new window