Pragmatic Constraint on Distributional Semantics

Zhemchuzhina, Elizaveta, Filippov, Nikolai, Yamshchikov, Ivan P.

Nov-20-2022–arXiv.org Artificial Intelligence

This paper studies the limits of language models' statistical learning in the context of Zipf's law. First, we demonstrate that Zipf-law token distribution emerges irrespective of the chosen tokenization. Second, we show that Zipf distribution is characterized by two distinct groups of tokens that differ both in terms of their frequency and their semantics. Namely, the tokens that have a one-to-one correspondence with one semantic concept have different statistical properties than those with semantic ambiguity. Finally, we demonstrate how these properties interfere with statistical learning procedures motivated by distributional semantics.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

Nov-20-2022

arXiv.org PDF

Add feedback

Country:
- Asia > Russia (0.04)
- North America > United States
  - Massachusetts > Middlesex County > Cambridge (0.04)
- Europe
  - Russia > Northwestern Federal District
    - Leningrad Oblast > Saint Petersburg (0.04)
  - Portugal > Lisbon
    - Lisbon (0.04)
  - Germany > Saxony
    - Leipzig (0.04)

Genre:
- Research Report (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (1.00)
  - Natural Language
    - Machine Translation (0.68)
    - Text Processing (0.49)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found