Adapting Stepsizes by Momentumized Gradients Improves Optimization and Generalization