ImprovedAnalysisofClippingAlgorithmsfor Non-convexOptimization
–Neural Information Processing Systems
Gradient clipping is commonly used in training deep neural networks partly due to its practicability in relieving the exploding gradient problem. Recently, Zhang et al. [2020a] show that clipped (stochastic) Gradient Descent (GD) converges faster than vanilla GD/SGD via introducing a new assumption called (L0,L1)smoothness, which characterizes the violent fluctuation of gradients typically encountered in deep neural networks.
Neural Information Processing Systems
Feb-9-2026, 21:30:41 GMT
- Country:
- Technology: