Fractal Patterns May Unravel the Intelligence in Next-Token Prediction

Alabdulmohsin, Ibrahim, Tran, Vinh Q., Dehghani, Mostafa

arXiv.org Artificial Intelligence 

Self-similar processes were introduced by Kolmogorov in 1940 (Kolmogorov, 1940). The notion garnered We study the fractal structure of language, aiming considerable attention during the late 1960s, thanks to to provide a precise formalism for quantifying the extensive works of Mandelbrot and his peers (Embrechts properties that may have been previously suspected & Maejima, 2000). Broadly speaking, an object is called but not formally shown. We establish that "self-similar" if it is invariant across scales, meaning its statistical language is: (1) self-similar, exhibiting complexities or geometric properties stay consistent irrespective at all levels of granularity, with no particular of the magnification applied to it (see Figure 1). Nature characteristic context length, and (2) longrange and geometry furnish us with many such patterns, such as dependent (LRD), with a Hurst parameter coastlines, snowflakes, the Cantor set and the Kuch curve. of approximately H = 0.70 0.09. Based Despite the distinction, self-similarity is often discussed on these findings, we argue that short-term patterns/dependencies in the context of "fractals," another term popularized by in language, such as in paragraphs, Mandelbrot in his seminal book The Fractal Geometry of mirror the patterns/dependencies over Nature (Mandelbrot, 1982). However, the two concepts are larger scopes, like entire documents.