ImprovedAnalysisofClippingAlgorithmsfor Non-convexOptimization

Neural Information Processing Systems 

Gradient clipping is commonly used in training deep neural networks partly due to its practicability in relieving the exploding gradient problem. Recently, Zhang et al. [2020a] show that clipped (stochastic) Gradient Descent (GD) converges faster than vanilla GD/SGD via introducing a new assumption called (L0,L1)smoothness, which characterizes the violent fluctuation of gradients typically encountered in deep neural networks.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found