Thompson Sampling for Noncompliant Bandits

Dec-3-2018–arXiv.org Machine Learning

Multi-Armed Bandit (MAB) (Sutton and Barto, 1998) problems are a class of sequential decision-making problems where an agent seeks to maximize rewards by acting in an unknown stationary environment. The MAB problem is often caricaturized using a set of slot machines with unknown payout distributions. The agent must decide which arm to pull in order to maximize earnings. Because the machines' reward distributions are initially unknown, the bandit must select actions that balance exploration (learning the reward distributions) with exploitation (playing the machine with highest expected reward). Contextual bandits (CB) (Li et al., 2010a) are a slightly modified MAB problem where the reward distributions are conditioned on an observation which is revealed to the agent prior to the selection of an action.

data mining, machine learning, reinforcement learning, (18 more...)

arXiv.org Machine Learning

Dec-3-2018

arXiv.org PDF

Add feedback

Country:
- Europe (1.00)
- North America > United States (0.67)

Genre:
- Research Report > Experimental Study (0.46)

Industry:
- Health & Medicine > Pharmaceuticals & Biotechnology (0.93)

Technology:
- Information Technology
  - Data Science > Data Mining
    - Big Data (0.66)
  - Artificial Intelligence
    - Machine Learning > Reinforcement Learning (0.68)
    - Representation & Reasoning > Uncertainty (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found