Small steps no more: Global convergence of stochastic gradient bandits for arbitrary learning rates
–Neural Information Processing Systems
In particular, we establish the surprising result that: F or any constant learning rate η > 0, the stochastic gradient bandit algorithm is guaranteed to converge to the globally optimal policy almost surely.
Neural Information Processing Systems
Oct-10-2025, 08:39:10 GMT
- Country:
- Europe
- France (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- North America > Canada
- Alberta (0.14)
- Europe
- Genre:
- Research Report
- Experimental Study (1.00)
- New Finding (0.92)
- Research Report
- Technology: