Reviews: Stochastic Chebyshev Gradient Descent for Spectral Optimization

Neural Information Processing Systems 

Spectral optimization is defined as finding \theta that minimizes F(A(\theta)) g(\theta) where A(\theta) is a symmetric matrix and F typically the trace of an analytic function i.e. F(A) tr(p(A)) where p is a polynomial. They propose an unbiased estimator of F by randomly truncating the Chebyshev approximation to F and doing importance sampling. Moreover, they calculate the optimal distribution for this importance sampling. They demonstrate how this method would be used for SGD and stochastic Variance Reduced Gradient.