Can Input Attributions Interpret the Inductive Reasoning Process Elicited in In-Context Learning?
Ye, Mengyu, Kuribayashi, Tatsuki, Kobayashi, Goro, Suzuki, Jun
–arXiv.org Artificial Intelligence
Elucidating the rationale behind neural models' outputs has been challenging in the machine learning field, which is indeed applicable in this age of large language models (LLMs) and in-context learning (ICL). When it comes to estimating input attributions (IA), ICL poses a new issue of interpreting which example in the prompt, consisting of a set of examples, contributed to identifying the task/rule to be solved. To this end, in this paper, we introduce synthetic diagnostic tasks inspired by the poverty of the stimulus design in inductive reasoning; here, most in-context examples are ambiguous w.r.t. their underlying rule, and one critical example disambiguates the task demonstrated. The question is whether conventional IA methods can identify such an example in interpreting the inductive reasoning process in ICL. Our experiments provide several practical findings; for example, a certain simple IA method works the best, and the larger the model, the generally harder it is to interpret the ICL with gradient-based IA methods.
arXiv.org Artificial Intelligence
Dec-20-2024
- Country:
- Oceania > Australia
- New South Wales > Sydney (0.04)
- North America
- Dominican Republic (0.04)
- United States
- Texas (0.04)
- New York (0.04)
- Washington > King County
- Seattle (0.04)
- California > San Diego County
- San Diego (0.04)
- Canada
- Ontario > Toronto (0.04)
- British Columbia > Metro Vancouver Regional District
- Vancouver (0.04)
- Europe
- Austria (0.04)
- Germany > Berlin (0.04)
- France (0.04)
- Italy > Tuscany
- Florence (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Denmark > Capital Region
- Copenhagen (0.04)
- Asia
- British Indian Ocean Territory > Diego Garcia (0.04)
- Middle East
- Jordan (0.04)
- UAE > Abu Dhabi Emirate
- Abu Dhabi (0.04)
- Japan > Honshū
- Tōhoku (0.04)
- Oceania > Australia
- Genre:
- Research Report > New Finding (0.46)
- Technology: