Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits

Agarwal, Alekh, Hsu, Daniel, Kale, Satyen, Langford, John, Li, Lihong, Schapire, Robert E.

Oct-13-2014–arXiv.org Machine Learning

In the contextual bandit problem, an agent collects rewards for actions taken over a sequence of rounds; in each round, the agent chooses an action to take on the basis of (i) context (or features) for the current round, as well as (ii) feedback, in the form of rewards, obtained in previous rounds. The feedback is incomplete: in any given round, the agent observes the reward only for the chosen action; the agent does not observe the reward for other actions. Contextual bandit problems are found in many important applications such as online recommendation and clinical trials, and represent a natural halfway point between supervised learning and reinforcement learning. The use of features to encode context is inherited from supervised machine learning, while exploration is necessary for good performance as in reinforcement learning. The choice of exploration distribution on actions is important. The strongest known results(Auer et al., 2002; McMahan and Streeter, 2009; Beygelzimer et al., 2011) provide algorithms that carefully control the exploration distribution to achieve an optimal regret after T rounds of () O KT log( Π /δ), with probability at least 1 δ, relative to a set of policies Π A

data mining, machine learning, reinforcement learning, (20 more...)

arXiv.org Machine Learning

Oct-13-2014

arXiv.org PDF

Add feedback

Genre:
- Research Report > New Finding (0.34)

Industry:
- Health & Medicine > Pharmaceuticals & Biotechnology (0.34)

Technology:
- Information Technology
  - Data Science > Data Mining
    - Big Data (0.68)
  - Artificial Intelligence > Machine Learning
    - Reinforcement Learning (0.54)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found