Surgical, Cheap, and Flexible: Mitigating False Refusal in Language Models via Single Vector Ablation

Open in new window