Mitigating the Curse of Detail: Scaling Arguments for Feature Learning and Sample Complexity
Rubin, Noa, Davidovich, Orit, Ringel, Zohar
Two pressing topics in the theory of deep learning are the interpretation of feature learning mechanisms and the determination of implicit bias of networks in the rich regime. Current theories of rich feature learning, often appear in the form of high-dimensional non-linear equations, which require computationally intensive numerical solutions. Given the many details that go into defining a deep learning problem, this complexity is a significant and often unavoidable challenge. Here, we propose a powerful heuristic route for predicting the data and width scales at which various patterns of feature learning emerge. This form of scale analysis is considerably simpler than exact theories and reproduces the scaling exponents of various known results. In addition, we make novel predictions on complex toy architectures, such as three-layer non-linear networks and attention heads, thus extending the scope of first-principle theories of deep learning.
Dec-10-2025
- Country:
- Asia > Middle East
- Israel
- Haifa District > Haifa (0.04)
- Jerusalem District > Jerusalem (0.04)
- Israel
- Europe > United Kingdom
- England > Cambridgeshire > Cambridge (0.04)
- North America > United States
- Gulf of Mexico > Central GOM (0.04)
- Massachusetts > Suffolk County
- Boston (0.04)
- New York > New York County
- New York City (0.14)
- Texas > Clay County (0.04)
- Utah > Carbon County (0.04)
- Wisconsin > Dane County
- Madison (0.04)
- Asia > Middle East
- Genre:
- Research Report (0.50)
- Industry:
- Education (0.34)