Layer-wise Adaptive Step-Sizes for Stochastic First-Order Methods for Deep Learning