Online Reinforcement Learning for Periodic MDP

Jul-25-2022–arXiv.org Artificial Intelligence

We study learning in periodic Markov Decision Process(MDP), a special type of non-stationary MDP where both the state transition probabilities and reward functions vary periodically, under the average reward maximization setting. We formulate the problem as a stationary MDP by augmenting the state space with the period index, and propose a periodic upper confidence bound reinforcement learning-2 (PUCRL2) algorithm. We show that the regret of PUCRL2 varies linearly with the period and as sub-linear with the horizon length. Numerical results demonstrate the efficacy of PUCRL2.

algorithm, mdp, probability, (15 more...)

arXiv.org Artificial Intelligence

Jul-25-2022

arXiv.org PDF

Add feedback

Genre:
- Instructional Material > Online (0.40)
- Research Report > New Finding (0.34)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Machine Learning
    - Reinforcement Learning (0.86)
    - Learning Graphical Models > Undirected Networks
      - Markov Models (0.35)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found