A Language and Its Dimensions: Intrinsic Dimensions of Language Fractal Structures
Gromov, Vasilii A., Borodin, Nikita S., Yerbolova, Asel S.
–arXiv.org Artificial Intelligence
The present paper introduces a novel object of study - a language fractal structure. We hypothesize that a set of embeddings of all $n$-grams of a natural language constitutes a representative sample of this fractal set. (We use the term Hailonakea to refer to the sum total of all language fractal structures, over all $n$). The paper estimates intrinsic (genuine) dimensions of language fractal structures for the Russian and English languages. To this end, we employ methods based on (1) topological data analysis and (2) a minimum spanning tree of a data graph for a cloud of points considered (Steele theorem). For both languages, for all $n$, the intrinsic dimensions appear to be non-integer values (typical for fractal sets), close to 9 for both of the Russian and English language.
arXiv.org Artificial Intelligence
Nov-20-2023
- Country:
- North America > United States
- New Jersey (0.04)
- Europe
- Czechia > Prague (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- Russia > Central Federal District
- Moscow Oblast > Moscow (0.04)
- Asia
- Russia (0.04)
- India > West Bengal
- Kolkata (0.04)
- North America > United States
- Genre:
- Research Report (1.00)
- Technology: