Critical Phase Transition in a Large Language Model
Nakaishi, Kai, Nishikawa, Yoshihiko, Hukushima, Koji
–arXiv.org Artificial Intelligence
The performance of large language models (LLMs) strongly depends on the \textit{temperature} parameter. Empirically, at very low temperatures, LLMs generate sentences with clear repetitive structures, while at very high temperatures, generated sentences are often incomprehensible. In this study, using GPT-2, we numerically demonstrate that the difference between the two regimes is not just a smooth change but a phase transition with singular, divergent statistical quantities. Our extensive analysis shows that critical behaviors, such as a power-law decay of correlation in a text, emerge in the LLM at the transition temperature as well as in a natural language dataset. We also discuss that several statistical quantities characterizing the criticality should be useful to evaluate the performance of LLMs.
arXiv.org Artificial Intelligence
Jun-7-2024
- Country:
- Asia > Japan
- Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
- North America > United States (0.04)
- Oceania > Australia
- Asia > Japan
- Genre:
- Research Report > New Finding (1.00)
- Technology: