Embers of Autoregression: Understanding Large Language Models Through the Problem They are Trained to Solve
McCoy, R. Thomas, Yao, Shunyu, Friedman, Dan, Hardy, Matthew, Griffiths, Thomas L.
–arXiv.org Artificial Intelligence
The widespread adoption of large language models (LLMs) makes it important to recognize their strengths and limitations. We argue that in order to develop a holistic understanding of these systems we need to consider the problem that they were trained to solve: next-word prediction over Internet text. By recognizing the pressures that this task exerts we can make predictions about the strategies that LLMs will adopt, allowing us to reason about when they will succeed or fail. This approach - which we call the teleological approach - leads us to identify three factors that we hypothesize will influence LLM accuracy: the probability of the task to be performed, the probability of the target output, and the probability of the provided input. We predict that LLMs will achieve higher accuracy when these probabilities are high than when they are low - even in deterministic settings where probability should not matter. To test our predictions, we evaluate two LLMs (GPT-3.5 and GPT-4) on eleven tasks, and we find robust evidence that LLMs are influenced by probability in the ways that we have hypothesized. In many cases, the experiments reveal surprising failure modes. For instance, GPT-4's accuracy at decoding a simple cipher is 51% when the output is a high-probability word sequence but only 13% when it is low-probability. These results show that AI practitioners should be careful about using LLMs in low-probability situations. More broadly, we conclude that we should not evaluate LLMs as if they are humans but should instead treat them as a distinct type of system - one that has been shaped by its own particular set of pressures.
arXiv.org Artificial Intelligence
Sep-24-2023
- Country:
- Asia
- Indonesia (0.04)
- Japan > Honshū
- Chūbu > Toyama Prefecture > Toyama (0.04)
- Middle East
- Republic of Türkiye (0.04)
- UAE > Abu Dhabi Emirate
- Abu Dhabi (0.04)
- Europe
- Austria > Vienna (0.13)
- Denmark (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Italy > Tuscany
- Florence (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.14)
- Oxfordshire > Oxford (0.13)
- North America
- Canada > Ontario
- Toronto (0.04)
- Dominican Republic (0.04)
- United States
- Indiana
- Lake County > Griffith (0.04)
- Marion County > Indianapolis (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- Massachusetts > Middlesex County
- Cambridge (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.13)
- New York (0.04)
- Pennsylvania > Allegheny County
- Pittsburgh (0.04)
- Texas > Travis County
- Austin (0.04)
- Washington > King County
- Seattle (0.04)
- Indiana
- Canada > Ontario
- Asia
- Genre:
- Research Report
- Experimental Study (1.00)
- New Finding (1.00)
- Research Report
- Industry:
- Technology: