Flavors of Margin: Implicit Bias of Steepest Descent in Homogeneous Neural Networks
Tsilivis, Nikolaos, Vardi, Gal, Kempe, Julia
We study the implicit bias of the general family of steepest descent algorithms, which includes gradient descent, sign descent and coordinate descent, in deep homogeneous neural networks. We prove that an algorithm-dependent geometric margin starts increasing once the networks reach perfect training accuracy and characterize the late-stage bias of the algorithms. In particular, we define a generalized notion of stationarity for optimization problems and show that the algorithms progressively reduce a (generalized) Bregman divergence, which quantifies proximity to such stationary points of a margin-maximization problem. We then experimentally zoom into the trajectories of neural networks optimized with various steepest descent algorithms, highlighting connections to the implicit bias of Adam.
Oct-29-2024
- Country:
- Africa
- Ethiopia > Addis Ababa
- Addis Ababa (0.04)
- Rwanda > Kigali
- Kigali (0.04)
- Ethiopia > Addis Ababa
- Asia
- Middle East > Jordan (0.04)
- Russia (0.04)
- Singapore > Central Region
- Singapore (0.04)
- Europe
- Austria
- Russia (0.04)
- Sweden > Stockholm
- Stockholm (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- Greater London > London (0.04)
- North America
- Canada > Quebec
- Montreal (0.04)
- United States
- California
- Los Angeles County > Long Beach (0.14)
- San Diego County > San Diego (0.04)
- Georgia > Fulton County
- Atlanta (0.04)
- Illinois > Cook County
- Chicago (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- New York > New York County
- New York City (0.04)
- Pennsylvania > Philadelphia County
- Philadelphia (0.04)
- California
- Canada > Quebec
- Africa
- Genre:
- Research Report (0.64)
- Technology: