Identifying the Best Transition Law
Ahmadipour, Mehrasa, Crepon, élise, Garivier, Aurélien
–arXiv.org Artificial Intelligence
Motivated by recursive learning in Markov Decision Processes, this paper studies best-arm identification in bandit problems where each arm's reward is drawn from a multinomial distribution with a known support. We compare the performance { reached by strategies including notably LUCB without and with use of this knowledge. } In the first case, we use classical non-parametric approaches for the confidence intervals. In the second case, where a probability distribution is to be estimated, we first use classical deviation bounds (Hoeffding and Bernstein) on each dimension independently, and then the Empirical Likelihood method (EL-LUCB) on the joint probability vector. The effectiveness of these methods is demonstrated through simulations on scenarios with varying levels of structural complexity.
arXiv.org Artificial Intelligence
Feb-17-2025
- Country:
- Europe > France
- Auvergne-Rhône-Alpes > Lyon > Lyon (0.04)
- North America > United States
- Massachusetts > Middlesex County
- Belmont (0.04)
- New York > New York County
- New York City (0.14)
- Massachusetts > Middlesex County
- Europe > France
- Genre:
- Research Report (0.70)
- Technology: