On the connection between Bregman divergence and value in regularized Markov decision processes

O'Donoghue, Brendan

arXiv.org Artificial Intelligence 

In this short note we derive a relationship between the Bregman divergence from the current policy to the optimal policy and the suboptimality of the current value function in a regularized Markov decision process. This result has implications for multi-task reinforcement learning, offline reinforcement learning, and regret analysis under function approximation, among others. The main result of this manuscript holds more generally, but for brevity we shall restrict ourselves to this case. To prove our main result we require a slight generalization of the performance difference lemma (PDL)[1] to cover the regularized MDP case. The proof of this identity is included in the appendix for completeness.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found