Aligning Large Language Models with Representation Editing: A Control Perspective
–Neural Information Processing Systems
Aligning large language models (LLMs) with human objectives is crucial for real-world applications. However, fine-tuning LLMs for alignment often suffers from unstable training and requires substantial computing resources. Test-time alignment techniques, such as prompting and guided decoding, do not modify the underlying model, and their performance remains dependent on the original model's capabilities. To address these challenges, we propose aligning LLMs through representation editing. The core of our method is to view a pre-trained autoregressive LLM as a discrete-time stochastic dynamical system.
Neural Information Processing Systems
Oct-11-2025, 00:18:44 GMT
- Country:
- North America > United States > Florida > Orange County > Orlando (0.04)
- Genre:
- Research Report > Experimental Study (0.93)
- Industry:
- Banking & Finance (1.00)
- Health & Medicine > Consumer Health (0.67)
- Law (0.67)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.46)
- Media (0.67)
- Technology: