DISCO: Disentangled Communication Steering for Large Language Models
–Neural Information Processing Systems
In contrast, we propose to inject steering vectors directly into the query and value representation spaces within attention heads. We provide evidence that a greater portion of these spaces exhibit high linear discriminability of concepts -a key property motivating the use of steering vectors-than attention head outputs. We analytically characterize the effect of our method, which we term DISentangled COmmunication (DISCO) Steering, on attention head outputs. Our analysis reveals that DISCO disentangles a strong but underutilized baseline, steering attention head inputs, which implicitly modifies queries and values in a rigid manner. In contrast, DISCO's direct modulation of these components enables more granular control. We find that DISCO achieves superior performance over a number of steering vector baselines across multiple datasets on LLaMA 3.1 8B and Gemma 2 9B, with steering efficacy scoring up to 19.1%higher than the runner-up. Our results support the conclusion that the query and value spaces are powerful building blocks for steering vector methods. Our code is publicly available at https://github.com/MaxTorop/DISCO.
Neural Information Processing Systems
Jun-17-2026, 08:12:19 GMT
- Country:
- North America > United States > California (0.27)
- Genre:
- Research Report
- New Finding (1.00)
- Experimental Study (1.00)
- Research Report
- Industry:
- Health & Medicine (1.00)
- Energy (1.00)
- Education (0.68)
- Banking & Finance (0.67)
- Information Technology (0.67)
- Government > Regional Government
- Technology: