Small steps no more: Global convergence of stochastic gradient bandits for arbitrary learning rates

Neural Information Processing Systems 

In particular, we establish the surprising result that: F or any constant learning rate η > 0, the stochastic gradient bandit algorithm is guaranteed to converge to the globally optimal policy almost surely.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found