Two Sides of One Coin: the Limits of Untuned SGD and the Power of Adaptive Methods

Open in new window