Time Trials on Second-Order and Variable-Learning-Rate Algorithms
–Neural Information Processing Systems
In 4 of these methods the gradient is divided component-wise by a decaying average of either the second derivatives or their absolute values.
Neural Information Processing Systems
Dec-31-1991