hdoc
lil'HDoC: An Algorithm for Good Arm Identification under Small Threshold Gap
Tsai, Tzu-Hsien, Tsai, Yun-Da, Lin, Shou-De
Good arm identification (GAI) is a pure-exploration bandit problem in which a single learner outputs an arm as soon as it is identified as a good arm. A good arm is defined as an arm with an expected reward greater than or equal to a given threshold. This paper focuses on the GAI problem under a small threshold gap, which refers to the distance between the expected rewards of arms and the given threshold. We propose a new algorithm called lil'HDoC to significantly improve the total sample complexity of the HDoC algorithm. We demonstrate that the sample complexity of the first $\lambda$ output arm in lil'HDoC is bounded by the original HDoC algorithm, except for one negligible term, when the distance between the expected reward and threshold is small. Extensive experiments confirm that our algorithm outperforms the state-of-the-art algorithms in both synthetic and real-world datasets.
- South America > Paraguay > Asunción > Asunción (0.04)
- North America > United States > Colorado (0.04)
- Asia > Taiwan > Taiwan Province > Taipei (0.04)
- Information Technology > Artificial Intelligence > Machine Learning (0.69)
- Information Technology > Data Science > Data Mining > Big Data (0.51)
Good Arm Identification via Bandit Feedback
Kano, Hideaki, Honda, Junya, Sakamaki, Kentaro, Matsuura, Kentaro, Nakamura, Atsuyoshi, Sugiyama, Masashi
We consider a novel stochastic multi-armed bandit problem called {\em good arm identification} (GAI), where a good arm is defined as an arm with expected reward greater than or equal to a given threshold. GAI is a pure-exploration problem that a single agent repeats a process of outputting an arm as soon as it is identified as a good one before confirming the other arms are actually not good. The objective of GAI is to minimize the number of samples for each process. We find that GAI faces a new kind of dilemma, the {\em exploration-exploitation dilemma of confidence}, which is different difficulty from the best arm identification. As a result, an efficient design of algorithms for GAI is quite different from that for the best arm identification. We derive a lower bound on the sample complexity of GAI that is tight up to the logarithmic factor $\mathrm{O}(\log \frac{1}{\delta})$ for acceptance error rate $\delta$. We also develop an algorithm whose sample complexity almost matches the lower bound. We also confirm experimentally that our proposed algorithm outperforms naive algorithms in synthetic settings based on a conventional bandit problem and clinical trial researches for rheumatoid arthritis.
- Health & Medicine > Therapeutic Area > Rheumatology (1.00)
- Health & Medicine > Therapeutic Area > Immunology (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Health & Medicine > Therapeutic Area > Musculoskeletal (0.88)
- Information Technology > Data Science > Data Mining > Big Data (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)