On the connection between Bregman divergence and value in regularized Markov decision processes

Nov-6-2022–arXiv.org Artificial Intelligence

In this short note we derive a relationship between the Bregman divergence from the current policy to the optimal policy and the suboptimality of the current value function in a regularized Markov decision process. This result has implications for multi-task reinforcement learning, offline reinforcement learning, and regret analysis under function approximation, among others. The main result of this manuscript holds more generally, but for brevity we shall restrict ourselves to this case. To prove our main result we require a slight generalization of the performance difference lemma (PDL)[1] to cover the regularized MDP case. The proof of this identity is included in the appendix for completeness.

artificial intelligence, machine learning, reinforcement learning, (11 more...)

arXiv.org Artificial Intelligence

Nov-6-2022

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.40)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Reinforcement Learning (0.48)
  - Learning Graphical Models > Undirected Networks
    - Markov Models (0.64)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found