Improved Analysis of Clipping Algorithms for Non-convex Optimization
–Neural Information Processing Systems
Gradient clipping is commonly used in training deep neural networks partly due to its practicability in relieving the exploding gradient problem. Recently, Zhang et al. [2020a] show that clipped (stochastic) Gradient Descent (GD) converges faster than vanilla GD/SGD via introducing a new assumption called (L
Neural Information Processing Systems
May-31-2025, 12:28:53 GMT