Mitigating Content Effects on Reasoning in Language Models through Fine-Grained Activation Steering

Open in new window