Phase diagram of early training dynamics in deep networks: effect of the learning rate, depth, and width
–Neural Information Processing Systems
We systematically analyze optimization dynamics in deep neural networks (DNNs) trained with stochastic gradient descent (SGD) and study the effect of learning rate η, depth d, and width w of the neural network.
Neural Information Processing Systems
Mar-27-2025, 16:23:47 GMT