On the Distributional Properties of Adaptive Gradients