An Optimal Algorithm for the Real-Valued Combinatorial Pure Exploration of Multi-Armed Bandit
Nakamura, Shintaro, Sugiyama, Masashi
–arXiv.org Artificial Intelligence
We study the real-valued combinatorial pure exploration problem in the stochastic multi-armed bandit (R-CPE-MAB). We study the case where the size of the action set is polynomial with respect to the number of arms. In such a case, the R-CPE-MAB can be seen as a special case of the so-called transductive linear bandits. Existing methods in the R-CPE-MAB and transductive linear bandits have a gap of problem-dependent constant terms and logarithmic terms between the upper and lower bounds of the sample complexity, respectively. We close these gaps by proposing an algorithm named the combinatorial gap-based exploration (CombGapE) algorithm, whose sample complexity upper bound matches the lower bound. Finally, we numerically show that the CombGapE algorithm outperforms existing methods significantly.
arXiv.org Artificial Intelligence
Dec-14-2023
- Country:
- Asia > Japan
- Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
- Europe
- Spain > Andalusia
- Cádiz Province > Cadiz (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- Spain > Andalusia
- North America > United States
- California > San Francisco County
- San Francisco (0.04)
- Maryland > Baltimore (0.04)
- Massachusetts > Middlesex County
- Cambridge (0.04)
- New York (0.04)
- Wisconsin > Dane County
- Madison (0.04)
- California > San Francisco County
- Asia > Japan
- Genre:
- Research Report (0.64)
- Technology: