Functional Bandits

Tran-Thanh, Long, Yu, Jia Yuan

arXiv.org Machine Learning 

The stochastic multi-armed bandit (MAB) model consists of a slot machine with K arms (or actions), each of which delivers rewards that are independently and randomly drawn from an unknown distribution when pulled. In the optimalarm identification problem, the aim is to find an arm with the highest expected reward value. To do so, we can pull the arms and learn (i.e., estimate) their mean rewards. That is, our goal is to distribute a finite budget of T pulls among the arms, such that at the end of the process, we can identify the optimal arm as accurately as possible. This stochastic optimisation problem models many practical applications, ranging from keyword bidding strategy optimisation in sponsored search[Amin et al., 2012], to identifying the best medicines in medical trials [Robbins, 1952], and efficient transmission channel detection in wireless communication networks [Avner, Mannor, and Shamir, 2012]. Although this MAB optimisation model is a well-studied in the online learning community, the focus is on finding the arm with the highest expected reward value [Maron and Moore, 1993, Mnih, Szepesvári, and Audibert, 2008, Audibert, Bubeck, and Munos, 2010b, Karnin, Koren, and Somekh, 2013].

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found