LLMs are Not Just Next Token Predictors

Downes, Stephen M., Forber, Patrick, Grzankowski, Alex

Aug-6-2024–arXiv.org Artificial Intelligence

LLMs are statistical models of language learning through stochastic gradient descent with a next token prediction objective. Prompting a popular view among AI modelers: LLMs are just next token predictors. While LLMs are engineered using next token prediction, and trained based on their success at this task, our view is that a reduction to just next token predictor sells LLMs short. Moreover, there are important explanations of LLM behavior and capabilities that are lost when we engage in this kind of reduction. In order to draw this out, we will make an analogy with a once prominent research program in biology explaining evolution and development from the genes eye view. LLMs are statistical models of language learning through stochastic gradient descent with a next token prediction objective. So, LLMs are'just next token predictors', a popular view among AI modelers, explicitly laid out by Shanahan (2024): "A great many tasks that demand intelligence in humans can be reduced to next-token prediction with a sufficiently performant model" (2024, 68), and "surely what they are doing is more than'just' next-token prediction? Well, it is an engineering fact that this is what an LLM does. The noteworthy thing is that next-token prediction is sufficient for solving previously unseen reasoning problems" (2024, 77).

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

Aug-6-2024

arXiv.org PDF

Add feedback

Country:
- Europe > United Kingdom
  - England
    - Cambridgeshire > Cambridge (0.04)
    - Oxfordshire > Oxford (0.14)
- North America > United States
  - Utah (0.04)

Genre:
- Research Report (0.40)

Industry:
- Education > Curriculum
  - Subject-Specific Education (0.44)
- Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Statistical Learning
    - Gradient Descent (0.94)
  - Natural Language > Large Language Model (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found