Language Model Behavioral Phases are Consistent Across Architecture, Training Data, and Scale

Jun-19-2026, 22:07:30 GMT–Neural Information Processing Systems

Based on our analysis of over 1,400 language model checkpoints on over 110,000 tokens of English, we find that up to 98% of the variance in language model behavior at the word level can be explained by three simple heuristics: the unigram probability (frequency) of a given word, the n-gram probability of the word, and the semantic similarity between the word and its context. Furthermore, we see consistent behavioral phases in all language models, with their predicted probabilities for words overfitting to those words' n-gram probabilities for increasing n over the course of training. Taken together, these results suggest that learning in neural language models may follow a similar trajectory irrespective of model details.

large language model, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Jun-19-2026, 22:07:30 GMT

Conferences PDF

Add feedback

Country:
- North America > United States (1.00)
- Europe (1.00)
- Asia > Middle East (0.67)

Genre:
- Research Report
  - New Finding (1.00)
  - Experimental Study (1.00)

Industry:
- Education (0.45)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language
    - Large Language Model (1.00)
    - Text Processing (0.68)
    - Chatbot (0.68)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found