SOM Directions are Better than One: Multi-Directional Refusal Suppression in Language Models

Open in new window