NeuronTune: Fine-Grained Neuron Modulation for Balanced Safety-Utility Alignment in LLMs

Pan, Birong, Xu, Mayi, Pi, Qiankun, Chen, Jianhao, Zhu, Yuanyuan, Zhong, Ming, Qian, Tieyun

Aug-14-2025–arXiv.org Artificial Intelligence

Ensuring robust safety alignment while preserving utility is critical for the reliable deployment of Large Language Models (LLMs). However, current techniques fundamentally suffer from intertwined deficiencies: insufficient robustness against malicious attacks, frequent refusal of benign queries, degradation in generated text quality and general task performance--the former two reflecting deficits in robust safety and the latter constituting utility impairment . We trace these limitations to the coarse-grained layer-wise interventions in existing methods. To resolve this, we propose NeuronT une, a fine-grained framework that dynamically modulates sparse neurons to achieve simultaneous safety-utility optimization. Our approach first identifies safety-critical and utility-preserving neurons across all layers via attribution, then employs meta-learning to adaptively amplify safety-neuron activations and suppress utility-neuron activations. Crucially, NeuronTune enables tunable adjustment of intervention scope via neuron-count thresholds, supporting flexible adaptation to security-critical or utility-priority scenarios. Extensive experimental results demonstrate that our method significantly outperforms existing state-of-the-art technologies, achieving superior model safety while maintaining excellent utility.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

Aug-14-2025

arXiv.org PDF

Add feedback

Country:
- North America > Mexico (0.28)

Genre:
- Research Report (0.70)
- Overview (0.66)

Industry:
- Information Technology > Security & Privacy (0.50)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.70)