Randomized Allocation with Nonparametric Estimation for Contextual Multi-Armed Bandits with Delayed Rewards
Multi-armed bandits were first introduced in the landmark paper by Robbins (1952). The development of multi-armed bandit methodology has been partly motivated by clinical trials with the aim of balancing two competing goals, 1) to effectively identify the best treatment (exploration) and 2) to treat patients as effectively as possible during the trial (exploitation). The classic formulation of the multi-armed bandit problem in the context of clinical practice is as follows: there are l treatments (arms) to treat a disease. The doctor (decision maker) has to choose for each patient, one of the l available treatments, which result in a reward (response) of improvement in the condition of the patient. The goal is to maximize the cumulated rewardsas much as possible.
Feb-4-2019
- Country:
- North America > United States
- Minnesota (0.04)
- Europe > United Kingdom
- England > Cambridgeshire > Cambridge (0.04)
- North America > United States
- Genre:
- Research Report
- Experimental Study (0.68)
- New Finding (0.48)
- Research Report
- Industry:
- Technology: