Improved Analysis of Clipping Algorithms for Non-convex Optimization

Neural Information Processing Systems 

Gradient clipping is commonly used in training deep neural networks partly due to its practicability in relieving the exploding gradient problem. Recently, Zhang et al. [2020a] show that clipped (stochastic) Gradient Descent (GD) converges faster than vanilla GD/SGD via introducing a new assumption called (L

Similar Docs  Excel Report  more

TitleSimilaritySource
None found