Algorithmic Regularization in Learning Deep Homogeneous Models: Layers are Automatically Balanced

Oct-8-2024, 20:25:02 GMT–Neural Information Processing Systems

We study the implicit regularization imposed by gradient descent for learning multi-layer homogeneous functions including feed-forward fully connected and convolutional deep neural networks with linear, ReLU or Leaky ReLU activation. We rigorously prove that gradient flow (i.e. This result implies that if the weights are initially small, gradient flow automatically balances the magnitudes of all layers. Using a discretization argument, we analyze gradient descent with positive step size for the non-convex low-rank asymmetric matrix factorization problem without any regularization. Inspired by our findings for gradient flow, we prove that gradient descent with step sizes \eta_t O(t { (1/2 \delta)}) (0 \delta\le1/2) automatically balances two low-rank factors and converges to a bounded global optimum.

algorithmic regularization, gradient descent, learning deep homogeneous model, (3 more...)

Neural Information Processing Systems

Oct-8-2024, 20:25:02 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Statistical Learning (1.00)
  - Neural Networks > Deep Learning (0.62)