A Study of Gradient Variance in Deep Learning