Representation Engineering for Large-Language Models: Survey and Research Challenges
Bartoszcze, Lukasz, Munshi, Sarthak, Sukidi, Bryan, Yen, Jennifer, Yang, Zejia, Williams-King, David, Le, Linh, Asuzu, Kosi, Maple, Carsten
–arXiv.org Artificial Intelligence
Large-language models are capable of completing a variety of tasks, but remain unpredictable and intractable. Representation engineering seeks to resolve this problem through a new approach utilizing samples of contrasting inputs to detect and edit high-level representations of concepts such as honesty, harmfulness or power-seeking. We formalize the goals and methods of representation engineering to present a cohesive picture of work in this emerging discipline. We compare it with alternative approaches, such as mechanistic interpretability, prompt-engineering and fine-tuning. We outline risks such as performance decrease, compute time increases and steerability issues. We present a clear agenda for future research to build predictable, dynamic, safe and personalizable LLMs.
arXiv.org Artificial Intelligence
Feb-24-2025
- Country:
- Asia
- China > Tianjin Province
- Tianjin (0.04)
- Myanmar > Tanintharyi Region
- Dawei (0.04)
- China > Tianjin Province
- Europe
- Italy > Marche
- Ancona Province > Ancona (0.04)
- Latvia > Lubāna Municipality
- Lubāna (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.14)
- West Midlands > Coventry (0.04)
- Italy > Marche
- North America
- Canada > Quebec
- Montreal (0.04)
- United States
- California > San Francisco County
- San Francisco (0.14)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- North Carolina > Orange County
- Chapel Hill (0.04)
- Washington > King County
- Seattle (0.04)
- California > San Francisco County
- Canada > Quebec
- Oceania > Australia
- New South Wales > Sydney (0.04)
- Asia
- Genre:
- Overview (1.00)
- Research Report > New Finding (0.67)
- Industry:
- Education (1.00)
- Health & Medicine (1.00)
- Information Technology > Security & Privacy (0.92)
- Technology: