The Quantization Model of Neural Scaling
–Neural Information Processing Systems
We propose the Quantization Model of neural scaling laws, explaining both the observed power law dropoff of loss with model and data size, and also the sudden emergence of new capabilities with scale. We derive this model from what we call the Quantization Hypothesis, where network knowledge and skills are quantized into discrete chunks (quanta). We show that when quanta are learned in order of decreasing use frequency, then a power law in use frequencies explains observed power law scaling of loss.
Neural Information Processing Systems
Dec-25-2025, 11:45:53 GMT
- Technology: