The Implicit Biases of Stochastic Gradient Descent on Deep Neural Networks with Batch Normalization

Open in new window