Towards understanding how momentum improves generalization in deep learning

Open in new window