Identifying the Best Transition Law

Ahmadipour, Mehrasa, Crepon, élise, Garivier, Aurélien

Feb-17-2025–arXiv.org Artificial Intelligence

Motivated by recursive learning in Markov Decision Processes, this paper studies best-arm identification in bandit problems where each arm's reward is drawn from a multinomial distribution with a known support. We compare the performance { reached by strategies including notably LUCB without and with use of this knowledge. } In the first case, we use classical non-parametric approaches for the confidence intervals. In the second case, where a probability distribution is to be estimated, we first use classical deviation bounds (Hoeffding and Bernstein) on each dimension independently, and then the Empirical Likelihood method (EL-LUCB) on the joint probability vector. The effectiveness of these methods is demonstrated through simulations on scenarios with varying levels of structural complexity.

artificial intelligence, data mining, machine learning, (14 more...)

arXiv.org Artificial Intelligence

Feb-17-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States > New York > New York County > New York City (0.14)

Genre:
- Research Report (0.70)

Technology:
- Information Technology
  - Data Science > Data Mining
    - Big Data (0.68)
  - Artificial Intelligence
    - Machine Learning (1.00)
    - Representation & Reasoning (0.94)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found