Tell, don't show: Declarative facts influence how LLMs generalize
Meinke, Alexander, Evans, Owain
–arXiv.org Artificial Intelligence
We examine how large language models (LLMs) generalize from abstract declarative statements in their training data. As an illustration, consider an LLM that is prompted to generate weather reports for London in 2050. One possibility is that the temperatures in the reports match the mean and variance of reports from 2023 (i.e. Another possibility is that the reports predict higher temperatures, by incorporating declarative statements about climate change from scientific papers written in 2023. An example of such a declarative statement is "global temperatures will increase by 1 To test the influence of abstract declarative statements, we construct tasks in which LLMs are finetuned on both declarative and procedural information. We find that declarative statements influence model predictions, even when they conflict with procedural information. In particular, finetuning on a declarative statement S increases the model likelihood for logical consequences of S. The effect of declarative statements is consistent across three domains: aligning an AI assistant, predicting weather, and predicting demographic features. Through a series of ablations, we show that the effect of declarative statements cannot be explained by associative learning based on matching keywords. Nevertheless, the effect of declarative statements on model likelihoods is small in absolute terms and increases surprisingly little with model size (i.e. from 330 million to 175 billion parameters). We argue that these results have implications for AI risk (in relation to the "treacherous turn") and for fairness. Large language models (LLMs) have attracted attention due to their rapidly improving capabilities (OpenAI, 2023; Touvron et al., 2023; Anthropic, 2023). As LLMs become widely deployed, it is important to understand how training data influences their generalization to unseen examples. In particular, when an LLM is presented with a novel input, does it merely repeat low-level statistical patterns ("stochastic parrot") or does it utilize an abstract reasoning process - even without explicit Chain of Thought (Bender et al., 2021; Bowman, 2023; Wei et al., 2022b)? Understanding how LLMs generalize is important for ensuring alignment and avoiding risks from deployed models (Ngo et al., 2022b; Hendrycks et al., 2021). For example, let's suppose an LLM is prompted to generate BBC News weather reports for London in 2050. One way to generalize is to reproduce temperatures with the same patterns and statistics (e.g. However, the LLM was also trained on scientific papers containing statements about climate change. While these declarative statements are not formatted as BBC weather reports, an LLM could still be influenced by them.
arXiv.org Artificial Intelligence
Dec-12-2023
- Country:
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- Genre:
- Research Report > Experimental Study (0.93)
- Technology: