Parameter-free Clipped Gradient Descent Meets Polyak Y uki T akezawa

Neural Information Processing Systems 

Gradient descent and its variants are de facto standard algorithms for training machine learning models. As gradient descent is sensitive to its hyperparame-ters, we need to tune the hyperparameters carefully using a grid search.