Robustness of Anytime Bandit Policies

Jul-25-2011–arXiv.org Machine Learning

This paper studies the deviations of the regret in a stochastic multi-armed bandit problem. When the total number of plays n is known beforehand by the agent, Audibert et al. (2009) exhibit a policy such that with probability at least 1-1/n, the regret of the policy is of order log(n). They have also shown that such a property is not shared by the popular ucb1 policy of Auer et al. (2002). This work first answers an open question: it extends this negative result to any anytime policy. The second contribution of this paper is to design anytime robust policies for specific multi-armed bandit problems in which some restrictions are put on the set of possible distributions of the different arms.

artificial intelligence, big data, inequality, (19 more...)

arXiv.org Machine Learning

Jul-25-2011

arXiv.org PDF

Add feedback

Country:
- Europe > France (0.14)
- North America > United States (0.14)

Genre:
- Research Report (1.00)

Technology:
- Information Technology
  - Artificial Intelligence (1.00)
  - Data Science > Data Mining
    - Big Data (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found