Remove that Square Root: A New Efficient Scale-Invariant Version of AdaGrad

Neural Information Processing Systems 

Adaptive methods are extremely popular in machine learning as they make learning rate tuning less expensive. This paper introduces a novel optimization algorithm named KATE, which presents a scale-invariant adaptation of the well-known Ada-Grad algorithm. We prove the scale-invariance of KATE for the case of Generalized Linear Models.