gzip Predicts Data-dependent Scaling Laws

May-26-2024–arXiv.org Artificial Intelligence

Past work has established scaling laws that predict the performance of a neural language model (LM) as a function of its parameter count and the number of tokens it's trained on, enabling optimal allocation of a fixed compute budget. Are these scaling laws agnostic to training data as some prior work suggests? We generate training datasets of varying complexities by modulating the syntactic properties of a PCFG, finding that 1) scaling laws are sensitive to differences in data complexity and that 2) gzip, a compression algorithm, is an effective predictor of how data complexity impacts scaling properties. We propose a new data-dependent scaling law for LM's that accounts for the training data's gzip-compressibility; its compute-optimal frontier increases in dataset size preference (over parameter count preference) as training data becomes harder to compress.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

May-26-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - California > San Francisco County
    - San Francisco (0.14)
  - Louisiana (0.14)

Genre:
- Research Report (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning
    - Neural Networks > Deep Learning (0.46)
    - Statistical Learning (0.68)
  - Natural Language > Grammars & Parsing (0.71)
  - Representation & Reasoning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found