On the Ineffectiveness of Variance Reduced Optimization for Deep Learning

Aaron Defazio, Leon Bottou

Neural Information Processing Systems 

SVR methods use control variates to reduce the variance of the traditional stochastic gradient descent (SGD) estimate f0i(w) of the full gradient f0(w). Control variates are a classical technique for reducing the variance of a stochastic quantity without introducing bias. Say we have some random variable X.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found