Closing the Generalization Gap of Adaptive Gradient Methods in Training Deep Neural Networks

Open in new window