Appendix
–Neural Information Processing Systems
The following lemma demonstrates the convergence property of the SGD framework when the gradient estimator v(x) is unbiased and has bounded variance. Differently, these papers assumed that the norm of the gradient estimator, i.e., v(x) One can also refer to the recent summary [6] for more general results on SGD. When the variance of v(x) is of order O(ϵ), one can use stepsizes that are independent of ϵ to guarantee ϵ-optimality or ϵ-stationarity. The algorithm would behave similar like gradient descent. Suppose that it holds true for t. ( We prove the case when F (x) is convex. This section demonstrates the bias, variance, and per-iteration cost of the L-SGD and the MLMCbased gradient estimators.
Neural Information Processing Systems
Mar-21-2025, 09:46:49 GMT
- Technology: