The pitfalls of next-token prediction
Bachmann, Gregor, Nagarajan, Vaishnavh
–arXiv.org Artificial Intelligence
Can a mere next-token predictor faithfully model human intelligence? We crystallize this emerging concern and correct popular misconceptions surrounding it, and advocate a simple multi-token objective. As a starting point, we argue that the two often-conflated phases of next-token prediction -- autoregressive inference and teacher-forced training -- must be treated distinctly. The popular criticism that errors can compound during autoregressive inference, crucially assumes that teacher-forcing has learned an accurate next-token predictor. This assumption sidesteps a more deep-rooted problem we expose: in certain classes of tasks, teacher-forcing can simply fail to learn an accurate next-token predictor in the first place. We describe a general mechanism of how teacher-forcing can fail, and design a minimal planning task where both the Transformer and the Mamba architecture empirically fail in that manner -- remarkably, despite the task being straightforward to learn. Finally, we provide preliminary evidence that this failure can be resolved using a simple modification that predicts multiple tokens in advance. We hope this finding can ground future debates and inspire explorations beyond the next-token prediction paradigm. We make our code available under https://github.com/gregorbachmann/Next-Token-Failures
arXiv.org Artificial Intelligence
Jul-5-2024
- Country:
- Oceania
- New Zealand > North Island
- Auckland Region > Auckland (0.04)
- Australia > New South Wales
- Sydney (0.04)
- New Zealand > North Island
- North America
- United States
- New York (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- California > Los Angeles County
- Long Beach (0.04)
- Canada
- Ontario > Toronto (0.04)
- Quebec > Montreal (0.04)
- British Columbia > Metro Vancouver Regional District
- Vancouver (0.04)
- United States
- Europe
- Asia
- Macao (0.04)
- China (0.04)
- Middle East > UAE
- Abu Dhabi Emirate > Abu Dhabi (0.04)
- Africa > Rwanda
- Oceania
- Genre:
- Research Report (0.50)
- Industry:
- Education > Curriculum > Subject-Specific Education (0.45)
- Technology: