Understanding the robustness difference between stochastic gradient descent and adaptive gradient methods

Open in new window