Can We Remove the Square-Root in Adaptive Gradient Methods? A Second-Order Perspective

Open in new window