Aligning Large Language Models with Representation Editing: A Control Perspective

Oct-11-2025, 00:18:44 GMT–Neural Information Processing Systems

Aligning large language models (LLMs) with human objectives is crucial for real-world applications. However, fine-tuning LLMs for alignment often suffers from unstable training and requires substantial computing resources. Test-time alignment techniques, such as prompting and guided decoding, do not modify the underlying model, and their performance remains dependent on the original model's capabilities. To address these challenges, we propose aligning LLMs through representation editing. The core of our method is to view a pre-trained autoregressive LLM as a discrete-time stochastic dynamical system.

arxiv preprint arxiv, language model, value function, (15 more...)

Neural Information Processing Systems

Oct-11-2025, 00:18:44 GMT

Conferences PDF

Add feedback

Country:
- North America > United States > Florida > Orange County > Orlando (0.04)

Genre:
- Research Report > Experimental Study (0.93)

Industry:
- Banking & Finance (1.00)
- Media (0.67)
- Health & Medicine > Consumer Health (0.67)
- Law (0.67)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
41bba7b0f5c81e789a20bb16a370aeeb-Paper-Conference.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found