Pragmatic Constraint on Distributional Semantics
Zhemchuzhina, Elizaveta, Filippov, Nikolai, Yamshchikov, Ivan P.
–arXiv.org Artificial Intelligence
This paper studies the limits of language models' statistical learning in the context of Zipf's law. First, we demonstrate that Zipf-law token distribution emerges irrespective of the chosen tokenization. Second, we show that Zipf distribution is characterized by two distinct groups of tokens that differ both in terms of their frequency and their semantics. Namely, the tokens that have a one-to-one correspondence with one semantic concept have different statistical properties than those with semantic ambiguity. Finally, we demonstrate how these properties interfere with statistical learning procedures motivated by distributional semantics.
arXiv.org Artificial Intelligence
Nov-20-2022
- Country:
- Asia > Russia (0.04)
- North America > United States
- Massachusetts > Middlesex County > Cambridge (0.04)
- Europe
- Russia > Northwestern Federal District
- Leningrad Oblast > Saint Petersburg (0.04)
- Portugal > Lisbon
- Lisbon (0.04)
- Germany > Saxony
- Leipzig (0.04)
- Russia > Northwestern Federal District
- Genre:
- Research Report (1.00)
- Technology: