Representation Engineering for Large-Language Models: Survey and Research Challenges
Bartoszcze, Lukasz, Munshi, Sarthak, Sukidi, Bryan, Yen, Jennifer, Yang, Zejia, Williams-King, David, Le, Linh, Asuzu, Kosi, Maple, Carsten
–arXiv.org Artificial Intelligence
Large-language models are capable of completing a variety of tasks, but remain unpredictable and intractable. Representation engineering seeks to resolve this problem through a new approach utilizing samples of contrasting inputs to detect and edit high-level representations of concepts such as honesty, harmfulness or power-seeking. We formalize the goals and methods of representation engineering to present a cohesive picture of work in this emerging discipline. We compare it with alternative approaches, such as mechanistic interpretability, prompt-engineering and fine-tuning. We outline risks such as performance decrease, compute time increases and steerability issues. We present a clear agenda for future research to build predictable, dynamic, safe and personalizable LLMs.
arXiv.org Artificial Intelligence
Feb-24-2025
- Country:
- Europe > United Kingdom
- England > Cambridgeshire > Cambridge (0.14)
- North America > United States
- California > San Francisco County
- San Francisco (0.14)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- California > San Francisco County
- Europe > United Kingdom
- Genre:
- Overview (1.00)
- Research Report > New Finding (0.67)
- Industry:
- Education (1.00)
- Health & Medicine (1.00)
- Information Technology > Security & Privacy (0.92)
- Technology: