Adapting Stepsizes by Momentumized Gradients Improves Optimization and Generalization

Open in new window