Regularized Contextual Bandits

Fontaine, Xavier, Berthet, Quentin, Perchet, Vianney

Oct-11-2018–arXiv.org Machine Learning

In sequential optimization problems, an agent takes successive decisions in order to minimize an unknown loss function. An important class of such problems, nowadays known as bandit problems, has been mathematically formalized by Robbins in his seminal paper (Robbins, 1952). In the so-called stochastic multi-armed bandit problem, an agent chooses to sample (or "pull") among K arms returning random rewards. Only the rewards of the selected arms are revealed to the agent who does not get any additional feedback. Bandits problems naturally model the exploration/exploitation tradeoffs which arise in sequential decision making under uncertainty. Various general algorithms have been proposed to solve this problem, following the work of Lai and Robbins (1985) who obtain a logarithmic regret for their sample-mean based policy. Further bounds have been obtained by Agrawal (1995) and Auer et al. (2002) who developed different versions of the well-known UCB algorithm. The setting of classical stochastic multi-armed bandits is unfortunately too restrictive for real-world applications. The choice of the agent can and should be influenced by additional information (referred to as "context" or "covariate") that is revealed by the environment.

algorithm, artificial intelligence, big data, (19 more...)

arXiv.org Machine Learning

Oct-11-2018

arXiv.org PDF

Add feedback

Country:
- Europe > United Kingdom
  - England > Cambridgeshire > Cambridge (0.14)
- North America > United States
  - Virginia (0.14)

Genre:
- Research Report (0.40)

Technology:
- Information Technology
  - Artificial Intelligence > Machine Learning (1.00)
  - Data Science > Data Mining
    - Big Data (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found