The Marginal Value of Adaptive Gradient Methods in Machine Learning