Understanding the robustness difference between stochastic gradient descent and adaptive gradient methods