T: Localized Fine-tuning on LLM Representations

May-28-2025, 12:13:56 GMT–Neural Information Processing Systems

Recent work in interpretability shows that large language models (LLMs) can be adapted for new tasks in a learning-free way: it is possible to intervene on LLM representations to elicit desired behaviors for alignment. For instance, adding certain bias vectors to the outputs of certain attention heads is reported to boost the truthfulness of models. In this work, we show that localized fine-tuning serves as an effective alternative to such representation intervention methods.

large language model, machine learning, natural language, (18 more...)

Neural Information Processing Systems

May-28-2025, 12:13:56 GMT

Conferences PDF

Add feedback

Country:
- Asia (1.00)
- North America > United States
  - Minnesota > Hennepin County > Minneapolis (0.14)

Genre:
- Research Report
  - Experimental Study (1.00)
  - New Finding (1.00)

Industry:
- Education (0.46)
- Government (0.67)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.96)
  - Natural Language > Large Language Model (1.00)