Improving Activation Steering in Language Models with Mean-Centring

Open in new window