Asymptotic Randomised Control with applications to bandits

Cohen, Samuel N., Treetanthiploet, Tanut

Oct-14-2020–arXiv.org Machine Learning

We consider a general multi-armed bandit problem with correlated (and simple contextual and restless) elements, as a relaxed control problem. By introducing an entropy premium, we obtain a smooth asymptotic approximation to the value function. This yields a novel semi-index approximation of the optimal decision process, obtained numerically by solving a fixed point problem, which can be interpreted as explicitly balancing an exploration-exploitation trade-off. Performance of the resulting Asymptotic Randomised Control (ARC) algorithm compares favourably with other approaches to correlated multi-armed bandits.

bandit, big data, upstream oil & gas, (21 more...)

arXiv.org Machine Learning

Oct-14-2020

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.14)
- Asia > Thailand (0.14)
- Europe > United Kingdom
  - England (0.14)

Genre:
- Research Report (0.49)

Industry:
- Government (0.45)
- Energy > Oil & Gas
  - Upstream (0.47)

Technology:
- Information Technology
  - Data Science > Data Mining
    - Big Data (1.00)
  - Artificial Intelligence
    - Representation & Reasoning (1.00)
    - Machine Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found