Can In-context Learners Learn a Reasoning Concept from Demonstrations?
Štefánik, Michal, Kadlčík, Marek
–arXiv.org Artificial Intelligence
Language models exhibit an emergent ability to learn a new task from a small number of input-output demonstrations. However, recent work shows that in-context learners largely rely on their pre-trained knowledge, such as the sentiment of the labels, instead of learning new associations from the input. We argue that the commonly-used few-shot evaluation using a random selection of in-context demonstrations can not disentangle models' reliance on such biases, as most of the randomly-selected demonstrations do not present relations informative for prediction beyond exposing the task's input-output distribution. Therefore, to evaluate models' in-context learning ability independent of models' memory, we introduce a Concept-sharing few-shot learning method choosing the demonstrations that share an underlying concept with the predicted sample. We extract a set of such concepts from available human explanations and measure how much models can benefit from presenting these concepts in few-shot demonstrations. We find that most of the recent in-context learners can not consistently benefit from the demonstrated concepts, irrespective of the model size. However, we note that T0 models are more sensitive to exhibited concepts, benefiting from concept-sharing demonstrations in 7 out of 8 evaluation scenarios.
arXiv.org Artificial Intelligence
Jul-19-2023
- Country:
- North America
- United States (0.04)
- Dominican Republic (0.04)
- Europe
- Czechia (0.04)
- Spain > Catalonia
- Barcelona Province > Barcelona (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- France > Provence-Alpes-Côte d'Azur
- Bouches-du-Rhône > Marseille (0.04)
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- Asia
- Middle East > UAE
- Abu Dhabi Emirate > Abu Dhabi (0.04)
- Japan > Kyūshū & Okinawa
- Kyūshū > Miyazaki Prefecture > Miyazaki (0.04)
- Middle East > UAE
- North America
- Genre:
- Research Report (0.50)
- Technology: