On Explore-Then-Commit Strategies
–Neural Information Processing Systems
We study the problem of minimising regret in two-armed bandit problems with Gaussian rewards. Our objective is to use this simple setting to illustrate that strategies based on an exploration phase (up to a stopping time) followed by exploitation are necessarily suboptimal. The results hold regardless of whether or not the difference in means between the two arms is known.
Neural Information Processing Systems
Nov-21-2025, 10:42:01 GMT
- Country:
- Europe
- France
- Hauts-de-France > Nord
- Lille (0.04)
- Occitanie > Haute-Garonne
- Toulouse (0.05)
- Hauts-de-France > Nord
- Spain > Catalonia
- Barcelona Province > Barcelona (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- France
- North America
- Canada > Alberta
- United States (0.14)
- Europe
- Genre:
- Research Report (0.46)
- Technology: