Appendix

Neural Information Processing Systems 

The following lemma demonstrates the convergence property of the SGD framework when the gradient estimator v(x) is unbiased and has bounded variance. Differently, these papers assumed that the norm of the gradient estimator, i.e., v(x) One can also refer to the recent summary [6] for more general results on SGD. When the variance of v(x) is of order O(ϵ), one can use stepsizes that are independent of ϵ to guarantee ϵ-optimality or ϵ-stationarity. The algorithm would behave similar like gradient descent. Suppose that it holds true for t. ( We prove the case when F (x) is convex. This section demonstrates the bias, variance, and per-iteration cost of the L-SGD and the MLMCbased gradient estimators.

Duplicate Docs Excel Report

Similar Docs  Excel Report  more

TitleSimilaritySource
None found