An exactly solvable model for emergence and scaling laws
Nam, Yoonsoo, Fonseca, Nayara, Lee, Seok Hyeong, Louis, Ard
Deep learning models can exhibit what appears to be a sudden ability to solve a new problem as training time ($T$), training data ($D$), or model size ($N$) increases, a phenomenon known as emergence. In this paper, we present a framework where each new ability (a skill) is represented as a basis function. We solve a simple multi-linear model in this skill-basis, finding analytic expressions for the emergence of new skills, as well as for scaling laws of the loss with training time, data size, model size, and optimal compute ($C$). We compare our detailed calculations to direct simulations of a two-layer neural network trained on multitask sparse parity, where the tasks in the dataset are distributed according to a power-law. Our simple model captures, using a single fit parameter, the sigmoidal emergence of multiple new skills as training time, data size or model size increases in the neural network.
Apr-26-2024
- Country:
- Asia
- Middle East > Jordan (0.04)
- South Korea > Seoul
- Seoul (0.04)
- Europe
- Latvia > Lubāna Municipality
- Lubāna (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- Oxfordshire > Oxford (0.04)
- Latvia > Lubāna Municipality
- North America
- Dominican Republic (0.04)
- United States (0.14)
- Asia
- Genre:
- Research Report (1.00)
- Technology: