Improved Analysis of Clipping Algorithms for Non-convex Optimization

Dec-24-2025, 11:18:15 GMT–Neural Information Processing Systems

Gradient clipping is commonly used in training deep neural networks partly due to its practicability in relieving the exploding gradient problem. Recently, \citet{zhang2019gradient} show that clipped (stochastic) Gradient Descent (GD) converges faster than vanilla GD via introducing a new assumption called $(L_0, L_1)$-smoothness, which characterizes the violent fluctuation of gradients typically encountered in deep neural networks. However, their iteration complexities on the problem-dependent parameters are rather pessimistic, and theoretical justification of clipping combined with other crucial techniques, e.g.

clipping algorithm, improved analysis, name change, (3 more...)

Neural Information Processing Systems

Dec-24-2025, 11:18:15 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Neural Networks > Deep Learning (0.64)
  - Statistical Learning > Gradient Descent (0.60)