The Quantization Model of Neural Scaling Eric J. Michaud
–Neural Information Processing Systems
We propose the Quantization Model of neural scaling laws, explaining both the observed power law dropoff of loss with model and data size, and also the sudden emergence of new capabilities with scale. We derive this model from what we call the Quantization Hypothesis, where network knowledge and skills are "quantized" into discrete chunks (quanta). We show that when quanta are learned in order of decreasing use frequency, then a power law in use frequencies explains observed power law scaling of loss.
Neural Information Processing Systems
May-23-2025, 21:22:05 GMT
- Country:
- Europe > United Kingdom > England > Greater London > London (0.14)
- Genre:
- Research Report (0.46)
- Technology: