Reviews: Efficient Second Order Online Learning by Sketching

Neural Information Processing Systems 

The present work takes a significant step in addressing this. The primary contribution of the paper are variations of Online Newton Step that remove this drawback using a sketching approximation to the scaling matrix and a clever implementation of sparse updates. The primary theoretical contributions are the analysis of the RP and FD versions of the algorithm. For RP they show a regret bound which holds when the matrix G_T (the matrix of observed gradients) is actually low-rank. Given the structure of the loss functions assumed, f_t(w) \ell( w, x_t), gradients will always be in the direction of the examples x_t, and so I think this theorem only holds when the data is actually low-rank.