Zipfian Whitening

Mar-27-2025, 11:32:58 GMT–Neural Information Processing Systems

The word embedding space in neural models is skewed, and correcting this can improve task performance. We point out that most approaches for modeling, correcting, and measuring the symmetry of an embedding space implicitly assume that the word frequencies are uniform; in reality, word frequencies follow a highly non-uniform distribution, known as Zipf's law. Surprisingly, simply performing PCA whitening weighted by the empirical word frequency that follows Zipf's law significantly improves task performance, surpassing established baselines. From a theoretical perspective, both our approach and existing methods can be clearly categorized: word representations are distributed according to an exponential family with either uniform or Zipfian base measures. By adopting the latter approach, we can naturally emphasize informative low-frequency words in terms of their vector norm, which becomes evident from the information-geometric perspective [42], and in terms of the loss functions for imbalanced classification [36].

artificial intelligence, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Mar-27-2025, 11:32:58 GMT

Conferences PDF

Add feedback

Country:
- Asia > Japan
  - Honshū (0.14)
- Europe (1.00)
- North America > United States
  - California (0.14)
  - Colorado (0.14)

Genre:
- Research Report > Experimental Study (1.00)

Industry:
- Health & Medicine (0.46)
- Information Technology (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.45)
  - Natural Language > Text Processing (0.68)
  - Representation & Reasoning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found