Variational Regret Bounds for Reinforcement Learning

Gajane, Pratik, Ortner, Ronald, Auer, Peter

May-23-2019–arXiv.org Machine Learning

For this For reinforcement learning in MDP with changes in reward problem setting, we propose an algorithm and function and transition probabilities, we provide provide performance guarantees for the regret an algorithm, UCRL with Restarts, a version of UCRL evaluated against the optimal non-stationary [Jaksch et al., 2010], which restarts according to a schedule policy. The upper bound on the regret is given dependent on the variation in the MDP (defined in terms of the total variation in the MDP. in Section 2 below). We derive a high-probability upper This is the first variational regret bound for the bound on the cumulative regret of our algorithm of general reinforcement learning setting.

artificial intelligence, probability, reinforcement learning, (18 more...)

arXiv.org Machine Learning

May-23-2019

arXiv.org PDF

Add feedback

Country:
- Europe > Austria (0.14)
- North America > United States (0.20)

Genre:
- Research Report (0.40)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.81)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found