Representation Engineering for Large-Language Models: Survey and Research Challenges

Bartoszcze, Lukasz, Munshi, Sarthak, Sukidi, Bryan, Yen, Jennifer, Yang, Zejia, Williams-King, David, Le, Linh, Asuzu, Kosi, Maple, Carsten

Feb-24-2025–arXiv.org Artificial Intelligence

Large-language models are capable of completing a variety of tasks, but remain unpredictable and intractable. Representation engineering seeks to resolve this problem through a new approach utilizing samples of contrasting inputs to detect and edit high-level representations of concepts such as honesty, harmfulness or power-seeking. We formalize the goals and methods of representation engineering to present a cohesive picture of work in this emerging discipline. We compare it with alternative approaches, such as mechanistic interpretability, prompt-engineering and fine-tuning. We outline risks such as performance decrease, compute time increases and steerability issues. We present a clear agenda for future research to build predictable, dynamic, safe and personalizable LLMs.

arxiv preprint arxiv, large language model, machine learning, (14 more...)

arXiv.org Artificial Intelligence

Feb-24-2025

arXiv.org PDF

Add feedback

Country:
- Europe > United Kingdom
  - England > Cambridgeshire > Cambridge (0.14)
- North America > United States
  - California > San Francisco County
    - San Francisco (0.14)
  - Minnesota > Hennepin County
    - Minneapolis (0.14)

Genre:
- Overview (1.00)
- Research Report > New Finding (0.67)

Industry:
- Education (1.00)
- Health & Medicine (1.00)
- Information Technology > Security & Privacy (0.92)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)
  - Natural Language > Large Language Model (1.00)