On the SDEs and Scaling Rules for Adaptive Gradient Algorithms