Rate-Optimal Policy Optimization for Linear Markov Decision Processes

Sherman, Uri, Cohen, Alon, Koren, Tomer, Mansour, Yishay

arXiv.org Artificial Intelligence 

Policy Optimization (PO) algorithms are a class of methods in Reinforcement Learning(RL; Sutton and Barto, 2018; Mannor et al., 2022) where the agent's policy is iteratively updated according to the (possibly preconditioned) gradient of the value function w.r.t.