States of LLM-generated Texts and Phase Transitions between them

Mikhaylovskiy, Nikolay

arXiv.org Artificial Intelligence 

While not long ago probabilistic autoregressive language models were just models that assign probabilities to sequences of words (Bahl et al., 1983), now they are the cornerstone of any task in computational linguistics through prompting (Sanh et al., 2022) or fine-tuning (Radford et al., 2018). Such models being successfully commercialized, the number of practical applications of these models is rapidly growing, as is the number of papers considering various aspects of the use of probabilistic autoregressive language models. It is all the more surprising that the statistical properties of the output sequences produced by such models have been studied relatively little. We aim to fill this gap somewhat and empirically demonstrate that, depending on the temperature parameter, LLMs can generate text that can be classified as solid (periodic phase), critical state (that has autocorrelations decay according to the power law) or gas (amorphous phase) from the point of view of autocorrelation analysis. Our main contributions are the following: 1. We clearly identify three phases of LLM-generated texts - periodic, critical and amorphous 2. We show through computational experiments that for LLM-generated texts, there is a phase transition from ordered to amorphous state at about the same temperatures between 0.7 and 1, for different LLMs 3. We show that for amorphous state, long-range autocorrelations decay follows the exponential law independently from the generation temperature, for different LLMs 4. We show that for temperatures between 0.7 and 1 autocorrelations exhibit power law decay on medium distances of up to 2000 words, implying isles of connectivity of these sizes. We go on to introduce the key concepts.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found