On Empirical Comparisons of Optimizers for Deep Learning