Mitigating Frequency Bias and Anisotropy in Language Model Pre-Training with Syntactic Smoothing

Martinez, Richard Diehl, Goriely, Zebulon, Caines, Andrew, Buttery, Paula, Beinborn, Lisa

Oct-15-2024–arXiv.org Artificial Intelligence

Language models strongly rely on frequency information because they maximize the likelihood of tokens during pre-training. As a consequence, language models tend to not generalize well to tokens that are seldom seen during training. Moreover, maximum likelihood training has been discovered to give rise to anisotropy: representations of tokens in a model tend to cluster tightly in a high-dimensional cone, rather than spreading out over their representational capacity. Our work introduces a method for quantifying the frequency bias of a language model by assessing sentence-level perplexity with respect to token-level frequency. We then present a method for reducing the frequency bias of a language model by inducing a syntactic prior over token representations during pre-training. Our Syntactic Smoothing method adjusts the maximum likelihood objective function to distribute the learning signal to syntactically similar tokens. This approach results in better performance on infrequent English tokens and a decrease in anisotropy. We empirically show that the degree of anisotropy in a model correlates with its frequency bias.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

Oct-15-2024

arXiv.org PDF

Add feedback

Country:
- Asia (0.68)
- North America > United States
  - Minnesota (0.28)
- Europe > United Kingdom
  - England > Cambridgeshire > Cambridge (0.14)

Genre:
- Research Report > New Finding (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Text Processing (0.47)
  - Machine Learning
    - Learning Graphical Models (0.54)
    - Neural Networks (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found