Variational Regret Bounds for Reinforcement Learning

Gajane, Pratik, Ortner, Ronald, Auer, Peter

arXiv.org Machine Learning 

For this For reinforcement learning in MDP with changes in reward problem setting, we propose an algorithm and function and transition probabilities, we provide provide performance guarantees for the regret an algorithm, UCRL with Restarts, a version of UCRL evaluated against the optimal non-stationary [Jaksch et al., 2010], which restarts according to a schedule policy. The upper bound on the regret is given dependent on the variation in the MDP (defined in terms of the total variation in the MDP. in Section 2 below). We derive a high-probability upper This is the first variational regret bound for the bound on the cumulative regret of our algorithm of general reinforcement learning setting.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found