Nostalgic Adam: Weighing more of the past gradients when designing the adaptive learning rate