Pure Exploration with Multiple Correct Answers
Degenne, Rémy, Koolen, Wouter M.
We determine the sample complexity of pure exploration bandit problems with multiple good answers. We derive a lower bound using a new game equilibrium argument. We show how continuity and convexity properties of single-answer problems ensures that the Track-and-Stop algorithm has asymptotically optimal sample complexity. However, that convexity is lost when going to the multiple-answer setting. We present a new algorithm which extends Track-and-Stop to the multiple-answer case and has asymptotic sample complexity matching the lower bound.
Feb-9-2019
- Country:
- Oceania > Australia
- New South Wales > Sydney (0.04)
- North America > United States
- New York
- New York County > New York City (0.14)
- Richmond County > New York City (0.04)
- Queens County > New York City (0.04)
- Kings County > New York City (0.04)
- Bronx County > New York City (0.04)
- Florida > Broward County
- Fort Lauderdale (0.04)
- New York
- Europe
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- Spain > Andalusia
- Cádiz Province > Cadiz (0.04)
- Netherlands > North Holland
- Amsterdam (0.04)
- United Kingdom > England
- Asia > Japan
- Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)
- Oceania > Australia
- Genre:
- Research Report (0.40)
- Technology: