PIXEL: Adaptive Steering Via Position-wise Injection with eXact Estimated Levels under Subspace Calibration

Yu, Manjiang, Li, Hongji, Singh, Priyanka, Li, Xue, Wang, Di, Hu, Lijie

Nov-19-2025–arXiv.org Artificial Intelligence

Reliable behavior control is central to deploying Large Language Models (LLMs) on the web. Activation steering offers a tuning-free route to align attributes (e.g., truthfulness) that ensure trustworthy generation. Prevailing approaches rely on coarse heuristics and lack a principled account of where to steer and how strongly to intervene. To this end, we propose P osition-wise I njection with eX act E stimated L evels (PIXEL), a position-wise activation steering framework that, in contrast to prior work, learns a property-aligned subspace from dual views (tail-averaged and end-token) and selects intervention strength via a constrained geometric objective with a closed-form solution, thereby adapting to token-level sensitivity without global hyperparameter tuning. PIXEL further performs sample-level orthogonal residual calibration to refine the global attribute direction and employs a lightweight position-scanning routine to identify receptive injection sites. We additionally provide representation-level guarantees for the minimal-intervention rule, supporting reliable alignment. Across diverse models and evaluation paradigms, PIXEL consistently improves attribute alignment while preserving model general capabilities, offering a practical and principled method for LLMs' controllable generation. To meet this need, a growing body of work has focused on post-training control mechanisms, which aim to adjust model behavior without retraining the entire model.

arxiv preprint arxiv, large language model, machine learning, (16 more...)

arXiv.org Artificial Intelligence

Nov-19-2025

arXiv.org PDF

Add feedback

Country:
- Europe (0.67)
- Oceania > Australia (0.28)

Genre:
- Research Report (1.00)

Industry:
- Education (0.47)
- Information Technology (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.47)