Randomized Allocation with Nonparametric Estimation for Contextual Multi-Armed Bandits with Delayed Rewards

Arya, Sakshi, Yang, Yuhong

arXiv.org Machine Learning 

Multi-armed bandits were first introduced in the landmark paper by Robbins (1952). The development of multi-armed bandit methodology has been partly motivated by clinical trials with the aim of balancing two competing goals, 1) to effectively identify the best treatment (exploration) and 2) to treat patients as effectively as possible during the trial (exploitation). The classic formulation of the multi-armed bandit problem in the context of clinical practice is as follows: there are l treatments (arms) to treat a disease. The doctor (decision maker) has to choose for each patient, one of the l available treatments, which result in a reward (response) of improvement in the condition of the patient. The goal is to maximize the cumulated rewardsas much as possible.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found