Mind the Gap: Assessing Temporal Generalization in Neural Language Models

Neural Information Processing Systems 

In the case of GPT -3 (Brown et al., 2020), such tasks include LAMBADA (Paperno et al., 2016), TriviaQA First, they do not assess a language model's ability to generalize well to future data from beyond their training period--an Augenstein et al., 2019), forecasting stock prices from the latest news articles (Ding et al., 2015), and answering knowledge-intensive questions like "How many people have been infected by COVID-19?" Second, the temporal overlap between the training and evaluation data increases the risk of "test data Nevertheless, language modelling data are not i.i.d. Brown et al. (2020) used This can potentially induce a correlation between the training and evaluation sets that LMs can exploit.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found