Alternating Gradient Flows: A Theory of Feature Learning in Two-layer Neural Networks

Jun-9-2026, 21:41:19 GMT–Neural Information Processing Systems

What features neural networks learn, and how, remains an open question. In this paper, we introduce Alternating Gradient Flows (AGF), an algorithmic framework that describes the dynamics of feature learning in two-layer networks trained from small initialization. Prior works have shown that gradient flow in this regime exhibits a staircase-like loss curve, alternating between plateaus where neurons slowly align to useful directions and sharp drops where neurons rapidly grow in norm. AGF approximates this behavior as an alternating two-step process: maximizing a utility function over dormant neurons and minimizing a cost function over active ones. AGF begins with all neurons dormant, corresponding to an initialization at the origin.

artificial intelligence, machine learning, proceedings, (5 more...)

Neural Information Processing Systems

Jun-9-2026, 21:41:19 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.32)