Embers of Autoregression: Understanding Large Language Models Through the Problem They are Trained to Solve

McCoy, R. Thomas, Yao, Shunyu, Friedman, Dan, Hardy, Matthew, Griffiths, Thomas L.

Sep-24-2023–arXiv.org Artificial Intelligence

The widespread adoption of large language models (LLMs) makes it important to recognize their strengths and limitations. We argue that in order to develop a holistic understanding of these systems we need to consider the problem that they were trained to solve: next-word prediction over Internet text. By recognizing the pressures that this task exerts we can make predictions about the strategies that LLMs will adopt, allowing us to reason about when they will succeed or fail. This approach - which we call the teleological approach - leads us to identify three factors that we hypothesize will influence LLM accuracy: the probability of the task to be performed, the probability of the target output, and the probability of the provided input. We predict that LLMs will achieve higher accuracy when these probabilities are high than when they are low - even in deterministic settings where probability should not matter. To test our predictions, we evaluate two LLMs (GPT-3.5 and GPT-4) on eleven tasks, and we find robust evidence that LLMs are influenced by probability in the ways that we have hypothesized. In many cases, the experiments reveal surprising failure modes. For instance, GPT-4's accuracy at decoding a simple cipher is 51% when the output is a high-probability word sequence but only 13% when it is low-probability. These results show that AI practitioners should be careful about using LLMs in low-probability situations. More broadly, we conclude that we should not evaluate LLMs as if they are humans but should instead treat them as a distinct type of system - one that has been shaped by its own particular set of pressures.

next-word prediction, reverse alphabetical order, statistically significant effect, (17 more...)

arXiv.org Artificial Intelligence

Sep-24-2023

arXiv.org PDF

Add feedback

Country:
- North America
  - Dominican Republic (0.04)
  - United States
    - New York (0.04)
    - Washington > King County
      - Seattle (0.04)
    - Texas > Travis County
      - Austin (0.04)
    - Pennsylvania > Allegheny County
      - Pittsburgh (0.04)
    - Minnesota > Hennepin County
      - Minneapolis (0.13)
    - Massachusetts > Middlesex County
      - Cambridge (0.04)
    - Louisiana > Orleans Parish
      - New Orleans (0.04)
    - Indiana
      - Marion County > Indianapolis (0.04)
      - Lake County > Griffith (0.04)
  - Canada > Ontario
    - Toronto (0.04)
- Europe
  - Austria > Vienna (0.13)
  - Denmark (0.04)
  - United Kingdom > England
    - Cambridgeshire > Cambridge (0.14)
    - Oxfordshire > Oxford (0.13)
  - Italy > Tuscany
    - Florence (0.04)
  - Ireland > Leinster
    - County Dublin > Dublin (0.04)
- Asia
  - Indonesia (0.04)
  - Middle East
    - Republic of Türkiye (0.04)
    - UAE > Abu Dhabi Emirate
      - Abu Dhabi (0.04)
  - Japan > Honshū
    - Chūbu > Toyama Prefecture > Toyama (0.04)

Genre:
- Research Report
  - New Finding (1.00)
  - Experimental Study (1.00)

Industry:
- Information Technology (0.67)
- Education (0.67)
- Health & Medicine > Therapeutic Area
  - Neurology (0.45)
- Government > Regional Government
  - North America Government > United States Government (0.67)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found