Online Sign Identification: Minimization of the Number of Errors in Thresholding Bandits
–Neural Information Processing Systems
In the fixed budget thresholding bandit problem, an algorithm sequentially allocates a budgeted number of samples to different distributions. It then predicts whether the mean of each distribution is larger or lower than a given threshold. We introduce a large family of algorithms (containing most existing relevant ones), inspired by the Frank-Wolfe algorithm, and provide a thorough yet generic analysis of their performance. This allowed us to construct new explicit algorithms, for a broad class of problems, whose losses are within a small constant factor of the non-adaptive oracle ones. Quite interestingly, we observed that adaptive methods empirically greatly out-perform non-adaptive oracles, an uncommon behavior in standard online learning settings, such as regret minimization. We explain this surprising phenomenon on an insightful toy problem.
Neural Information Processing Systems
Sep-30-2024, 06:09:20 GMT
- Country:
- Europe
- France
- Auvergne-Rhône-Alpes > Isère
- Grenoble (0.04)
- Hauts-de-France > Nord
- Lille (0.04)
- Auvergne-Rhône-Alpes > Isère
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- France
- North America > United States
- New York > New York County > New York City (0.04)
- Europe
- Genre:
- Research Report (0.67)
- Industry:
- Education (0.34)
- Technology: