Understanding the Role of Momentum in Stochastic Gradient Methods