Algorithmic Regularization in Learning Deep Homogeneous Models: Layers are Automatically Balanced

Simon S. Du, Wei Hu, Jason D. Lee

Neural Information Processing Systems 

We study the implicit regularization imposed by gradient descent for learning multi-layer homogeneous functions including feed-forward fully connected and convolutional deep neural networks with linear, ReLU or Leaky ReLU activation. We rigorously prove that gradient flow (i.e.