Randomized Allocation with Nonparametric Estimation for Contextual Multi-Armed Bandits with Delayed Rewards

Feb-4-2019–arXiv.org Machine Learning

Multi-armed bandits were first introduced in the landmark paper by Robbins (1952). The development of multi-armed bandit methodology has been partly motivated by clinical trials with the aim of balancing two competing goals, 1) to effectively identify the best treatment (exploration) and 2) to treat patients as effectively as possible during the trial (exploitation). The classic formulation of the multi-armed bandit problem in the context of clinical practice is as follows: there are l treatments (arms) to treat a disease. The doctor (decision maker) has to choose for each patient, one of the l available treatments, which result in a reward (response) of improvement in the condition of the patient. The goal is to maximize the cumulated rewardsas much as possible.

bandit problem, covariate, delay distribution, (15 more...)

arXiv.org Machine Learning

Feb-4-2019

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Minnesota (0.04)
- Europe > United Kingdom
  - England > Cambridgeshire > Cambridge (0.04)

Genre:
- Research Report
  - Experimental Study (0.68)
  - New Finding (0.48)

Industry:
- Health & Medicine > Pharmaceuticals & Biotechnology (0.48)

Technology:
- Information Technology
  - Artificial Intelligence > Machine Learning (1.00)
  - Data Science > Data Mining
    - Big Data (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found