Optimal Best Markovian Arm Identification with Fixed Confidence
We give a complete characterization of the sampling complexity of best Markovian arm identification in one-parameter Markovian bandit models. We derive instance specific nonasymptotic and asymptotic lower bounds which generalize those of the IID setting. We analyze the Track-and-Stop strategy, initially proposed for the IID setting, and we prove that asymptotically it is at most a factor of four apart from the lower bound. Our one-parameter Markovian bandit model is based on the notion of an exponential family of stochastic matrices for which we establish many useful properties. For the analysis of the Track-and-Stop strategy we derive a novel concentration inequality for Markov chains that may be of interest in its own right.
Dec-21-2019
- Country:
- North America > United States
- New York (0.04)
- Pennsylvania > Philadelphia County
- Philadelphia (0.04)
- New Jersey > Hudson County
- Hoboken (0.04)
- California
- Santa Clara County > Palo Alto (0.04)
- Alameda County > Berkeley (0.04)
- Europe > United Kingdom
- England > Cambridgeshire > Cambridge (0.04)
- Asia
- Middle East > Jordan (0.04)
- Japan > Kyūshū & Okinawa
- Okinawa (0.04)
- North America > United States
- Genre:
- Research Report (0.50)