Comparing Bottom-Up and Top-Down Steering Approaches on In-Context Learning Tasks
Brumley, Madeline, Kwon, Joe, Krueger, David, Krasheninnikov, Dmitrii, Anwar, Usman
–arXiv.org Artificial Intelligence
A key objective of interpretability research on large language models (LLMs) is to develop methods for robustly steering models toward desired behaviors. To this end, two distinct approaches to interpretability -- ``bottom-up" and ``top-down" -- have been presented, but there has been little quantitative comparison between them. We present a case study comparing the effectiveness of representative vector steering methods from each branch: function vectors (FV; arXiv:2310.15213), as a bottom-up method, and in-context vectors (ICV; arXiv:2311.06668) as a top-down method. While both aim to capture compact representations of broad in-context learning tasks, we find they are effective only on specific types of tasks: ICVs outperform FVs in behavioral shifting, whereas FVs excel in tasks requiring more precision. We discuss the implications for future evaluations of steering methods and for further research into top-down and bottom-up steering given these findings.
arXiv.org Artificial Intelligence
Nov-11-2024
- Country:
- South America > Colombia
- Meta Department > Villavicencio (0.04)
- North America
- United States
- Massachusetts (0.04)
- California > San Diego County
- San Diego (0.04)
- Canada > Quebec
- Montreal (0.04)
- United States
- Europe
- United Kingdom > England
- Cambridgeshire > Cambridge (0.14)
- Latvia > Lubāna Municipality
- Lubāna (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- United Kingdom > England
- South America > Colombia
- Genre:
- Research Report > New Finding (0.47)
- Technology: