Understanding In-Context Learning from Repetitions
Yan, Jianhao, Xu, Jin, Song, Chiyu, Wu, Chenming, Li, Yafu, Zhang, Yue
–arXiv.org Artificial Intelligence
This paper explores the elusive mechanism underpinning in-context learning in Large Language Models (LLMs). Our work provides a novel perspective by examining in-context learning via the lens of surface repetitions. We quantitatively investigate the role of surface features in text generation, and empirically establish the existence of token co-occurrence reinforcement, a principle that strengthens the relationship between two tokens based on their contextual co-occurrences. By investigating the dual impacts of these features, our research illuminates the internal workings of in-context learning and expounds on the reasons for its failures. This paper provides an essential contribution to the understanding of in-context learning and its potential limitations, providing a fresh perspective on this exciting capability. The impressive ability of Large Language Models (LLMs; Touvron et al. (2023); Chowdhery et al. (2022); OpenAI (2023)) to execute in-context learning (ICL) is a standout characteristic. This behavior mirrors human learning and reasoning from analogy (Winston, 1980), enabling LLMs to rapidly adapt to a range of downstream tasks. Without being explicitly pretrained to learn from demonstrations, LLMs can predict responses to unseen test queries from a few demonstrations and without any instruction given (Brown et al., 2020; Zhang et al., 2022; Chowdhery et al., 2022). An example of in-context learning can be found in Figure 1(a), where a pre-trained LLaMA model is given demonstrations for a binary classification task, and learns to make predictions correctly. Despite the success in applications, the working mechanism of in-context learning is still an open question. We take a feature-centric view to understand ICL, analyzing the key patterns in the input context that correlate with ICL behavior. In particular, as Figure 1(b) shows, in-context demonstrations can result not only in desired effects but also cause errors. In this example, the same LLaMA model makes the incorrect prediction'True' given the input "Circulation revenue has decreased by 5% in Finland.", which is likely because of the repeated pattern "Answer:" -> "True" from the demonstrations. In the same perspective, the success case in Figure 1(a) can be attributed to learning desired patterns such as "Answer:" -> "True|False" in the demonstrations.
arXiv.org Artificial Intelligence
Oct-9-2023
- Country:
- Asia > China
- Guangxi Province > Nanning (0.04)
- Europe
- Finland (0.25)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Asia > China
- Genre:
- Research Report > New Finding (1.00)
- Technology: