Asymptotic Randomised Control with applications to bandits
Cohen, Samuel N., Treetanthiploet, Tanut
We consider a general multi-armed bandit problem with correlated (and simple contextual and restless) elements, as a relaxed control problem. By introducing an entropy premium, we obtain a smooth asymptotic approximation to the value function. This yields a novel semi-index approximation of the optimal decision process, obtained numerically by solving a fixed point problem, which can be interpreted as explicitly balancing an exploration-exploitation trade-off. Performance of the resulting Asymptotic Randomised Control (ARC) algorithm compares favourably with other approaches to correlated multi-armed bandits.
Oct-14-2020
- Country:
- North America > United States (0.14)
- Asia > Thailand (0.14)
- Europe > United Kingdom
- England (0.14)
- Genre:
- Research Report (0.49)
- Industry:
- Government (0.45)
- Energy > Oil & Gas
- Upstream (0.47)
- Technology: