Goto

Collaborating Authors

 self-reinforcement effect



Learning to Break the Loop: Analyzing and Mitigating Repetitions for Neural Text Generation

Neural Information Processing Systems

While large-scale neural language models, such as GPT2 and BART,have achieved impressive results on various text generation tasks, they tend to get stuck in undesirable sentence-level loops with maximization-based decoding algorithms (\textit{e.g.}, greedy search). This phenomenon is counter-intuitive since there are few consecutive sentence-level repetitions in the human corpus (e.g., 0.02\% in Wikitext-103). To investigate the underlying reasons for generating consecutive sentence-level repetitions, we study the relationship between the probability of repetitive tokens and their previous repetitions in context. Through our quantitative experiments, we find that 1) Models have a preference to repeat the previous sentence; 2) The sentence-level repetitions have a \textit{self-reinforcement effect}: the more times a sentence is repeated in the context, the higher the probability of continuing to generate that sentence; 3) The sentences with higher initial probabilities usually have a stronger self-reinforcement effect. Motivated by our findings, we propose a simple and effective training method \textbf{DITTO} (Pseu\underline{D}o-Repet\underline{IT}ion Penaliza\underline{T}i\underline{O}n), where the model learns to penalize probabilities of sentence-level repetitions from synthetic repetitive data. Although our method is motivated by mitigating repetitions, our experiments show that DITTO not only mitigates the repetition issue without sacrificing perplexity, but also achieves better generation quality. Extensive experiments on open-ended text generation (Wikitext-103) and text summarization (CNN/DailyMail) demonstrate the generality and effectiveness of our method.


Appendix of ' Learning to Break the Loop: Analyzing and Mitigating Repetitions for Neural Text Generation '

Neural Information Processing Systems

We calculate it for each sequence x and average over the whole corpus. When decoding auto-regressively, the probabilities of the repetitive sentence loops also have a self-reinforcement effect. As shown in Figure 2, the probability of the token'located' increases almost The work was conducted in Apple. Here we use the end token to split sentences for ease of experiments. We present the probability of the token'located' ( y-axis) as the number of historical repetitions Best viewed in color and zoomed in a desktop monitor.



Learning to Break the Loop: Analyzing and Mitigating Repetitions for Neural Text Generation

Neural Information Processing Systems

While large-scale neural language models, such as GPT2 and BART,have achieved impressive results on various text generation tasks, they tend to get stuck in undesirable sentence-level loops with maximization-based decoding algorithms (\textit{e.g.}, greedy search). This phenomenon is counter-intuitive since there are few consecutive sentence-level repetitions in the human corpus (e.g., 0.02\% in Wikitext-103). To investigate the underlying reasons for generating consecutive sentence-level repetitions, we study the relationship between the probability of repetitive tokens and their previous repetitions in context. Through our quantitative experiments, we find that 1) Models have a preference to repeat the previous sentence; 2) The sentence-level repetitions have a \textit{self-reinforcement effect}: the more times a sentence is repeated in the context, the higher the probability of continuing to generate that sentence; 3) The sentences with higher initial probabilities usually have a stronger self-reinforcement effect. Motivated by our findings, we propose a simple and effective training method \textbf{DITTO} (Pseu\underline{D}o-Repet\underline{IT}ion Penaliza\underline{T}i\underline{O}n), where the model learns to penalize probabilities of sentence-level repetitions from synthetic repetitive data.


Penalty Decoding: Well Suppress the Self-Reinforcement Effect in Open-Ended Text Generation

Zhu, Wenhong, Hao, Hongkun, Wang, Rui

arXiv.org Artificial Intelligence

The decoding algorithm is critical for open-ended text generation, transforming latent representations into coherent and meaningful outputs. This paper investigates the self-reinforcement effect in text generation and the effectiveness of a repetition penalty to mitigate it. However, determining the optimal repetition penalty value is challenging. To tackle this, we propose a forgetting mechanism that disregards distant tokens, reducing the burden of penalty selection. In addition, we introduce a length penalty to address overly short sentences caused by excessive penalties. Our penalty decoding approach incorporating three strategies helps resolve issues with sampling methods deviating from factual information. Experimental results demonstrate the efficacy of our approach in generating high-quality sentences resembling human output.


Understanding In-Context Learning from Repetitions

Yan, Jianhao, Xu, Jin, Song, Chiyu, Wu, Chenming, Li, Yafu, Zhang, Yue

arXiv.org Artificial Intelligence

This paper explores the elusive mechanism underpinning in-context learning in Large Language Models (LLMs). Our work provides a novel perspective by examining in-context learning via the lens of surface repetitions. We quantitatively investigate the role of surface features in text generation, and empirically establish the existence of token co-occurrence reinforcement, a principle that strengthens the relationship between two tokens based on their contextual co-occurrences. By investigating the dual impacts of these features, our research illuminates the internal workings of in-context learning and expounds on the reasons for its failures. This paper provides an essential contribution to the understanding of in-context learning and its potential limitations, providing a fresh perspective on this exciting capability. The impressive ability of Large Language Models (LLMs; Touvron et al. (2023); Chowdhery et al. (2022); OpenAI (2023)) to execute in-context learning (ICL) is a standout characteristic. This behavior mirrors human learning and reasoning from analogy (Winston, 1980), enabling LLMs to rapidly adapt to a range of downstream tasks. Without being explicitly pretrained to learn from demonstrations, LLMs can predict responses to unseen test queries from a few demonstrations and without any instruction given (Brown et al., 2020; Zhang et al., 2022; Chowdhery et al., 2022). An example of in-context learning can be found in Figure 1(a), where a pre-trained LLaMA model is given demonstrations for a binary classification task, and learns to make predictions correctly. Despite the success in applications, the working mechanism of in-context learning is still an open question. We take a feature-centric view to understand ICL, analyzing the key patterns in the input context that correlate with ICL behavior. In particular, as Figure 1(b) shows, in-context demonstrations can result not only in desired effects but also cause errors. In this example, the same LLaMA model makes the incorrect prediction'True' given the input "Circulation revenue has decreased by 5% in Finland.", which is likely because of the repeated pattern "Answer:" -> "True" from the demonstrations. In the same perspective, the success case in Figure 1(a) can be attributed to learning desired patterns such as "Answer:" -> "True|False" in the demonstrations.


Learning to Break the Loop: Analyzing and Mitigating Repetitions for Neural Text Generation

Xu, Jin, Liu, Xiaojiang, Yan, Jianhao, Cai, Deng, Li, Huayang, Li, Jian

arXiv.org Artificial Intelligence

While large-scale neural language models, such as GPT2 and BART, have achieved impressive results on various text generation tasks, they tend to get stuck in undesirable sentence-level loops with maximization-based decoding algorithms (\textit{e.g.}, greedy search). This phenomenon is counter-intuitive since there are few consecutive sentence-level repetitions in human corpora (e.g., 0.02\% in Wikitext-103). To investigate the underlying reasons for generating consecutive sentence-level repetitions, we study the relationship between the probabilities of the repetitive tokens and their previous repetitions in the context. Through our quantitative experiments, we find that 1) Language models have a preference to repeat the previous sentence; 2) The sentence-level repetitions have a \textit{self-reinforcement effect}: the more times a sentence is repeated in the context, the higher the probability of continuing to generate that sentence; 3) The sentences with higher initial probabilities usually have a stronger self-reinforcement effect. Motivated by our findings, we propose a simple and effective training method \textbf{DITTO} (Pseu\underline{D}o-Repet\underline{IT}ion Penaliza\underline{T}i\underline{O}n), where the model learns to penalize probabilities of sentence-level repetitions from pseudo repetitive data. Although our method is motivated by mitigating repetitions, experiments show that DITTO not only mitigates the repetition issue without sacrificing perplexity, but also achieves better generation quality. Extensive experiments on open-ended text generation (Wikitext-103) and text summarization (CNN/DailyMail) demonstrate the generality and effectiveness of our method.