An Optimal Algorithm for the Real-Valued Combinatorial Pure Exploration of Multi-Armed Bandit

Dec-14-2023–arXiv.org Artificial Intelligence

We study the real-valued combinatorial pure exploration problem in the stochastic multi-armed bandit (R-CPE-MAB). We study the case where the size of the action set is polynomial with respect to the number of arms. In such a case, the R-CPE-MAB can be seen as a special case of the so-called transductive linear bandits. Existing methods in the R-CPE-MAB and transductive linear bandits have a gap of problem-dependent constant terms and logarithmic terms between the upper and lower bounds of the sample complexity, respectively. We close these gaps by proposing an algorithm named the combinatorial gap-based exploration (CombGapE) algorithm, whose sample complexity upper bound matches the lower bound. Finally, we numerically show that the CombGapE algorithm outperforms existing methods significantly.

algorithm, bandit, r-cpe-mab, (15 more...)

arXiv.org Artificial Intelligence

Dec-14-2023

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - New York (0.04)
  - Maryland > Baltimore (0.04)
  - Wisconsin > Dane County
    - Madison (0.04)
  - Massachusetts > Middlesex County
    - Cambridge (0.04)
  - California > San Francisco County
    - San Francisco (0.04)
- Europe
  - United Kingdom > England
    - Cambridgeshire > Cambridge (0.04)
  - Spain > Andalusia
    - Cádiz Province > Cadiz (0.04)
- Asia > Japan
  - Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)

Genre:
- Research Report (0.64)

Technology:
- Information Technology
  - Data Science > Data Mining
    - Big Data (0.85)
  - Artificial Intelligence
    - Representation & Reasoning (1.00)
    - Machine Learning (1.00)