Functional Bandits

May-10-2014–arXiv.org Machine Learning

The stochastic multi-armed bandit (MAB) model consists of a slot machine with K arms (or actions), each of which delivers rewards that are independently and randomly drawn from an unknown distribution when pulled. In the optimalarm identification problem, the aim is to find an arm with the highest expected reward value. To do so, we can pull the arms and learn (i.e., estimate) their mean rewards. That is, our goal is to distribute a finite budget of T pulls among the arms, such that at the end of the process, we can identify the optimal arm as accurately as possible. This stochastic optimisation problem models many practical applications, ranging from keyword bidding strategy optimisation in sponsored search[Amin et al., 2012], to identifying the best medicines in medical trials [Robbins, 1952], and efficient transmission channel detection in wireless communication networks [Avner, Mannor, and Shamir, 2012]. Although this MAB optimisation model is a well-studied in the online learning community, the focus is on finding the arm with the highest expected reward value [Maron and Moore, 1993, Mnih, Szepesvári, and Audibert, 2008, Audibert, Bubeck, and Munos, 2010b, Karnin, Koren, and Somekh, 2013].

artificial intelligence, data mining, machine learning, (20 more...)

arXiv.org Machine Learning

May-10-2014

arXiv.org PDF

Add feedback

Country:
- Europe > United Kingdom (0.28)

Genre:
- Research Report (0.50)

Technology:
- Information Technology
  - Data Science > Data Mining
    - Big Data (0.35)
  - Artificial Intelligence
    - Representation & Reasoning (0.93)
    - Machine Learning > Statistical Learning (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found