Nostalgic Adam: Weighing more of the past gradients when designing the adaptive learning rate

Open in new window