Rate-Optimal Policy Optimization for Linear Markov Decision Processes
Sherman, Uri, Cohen, Alon, Koren, Tomer, Mansour, Yishay
–arXiv.org Artificial Intelligence
Policy Optimization (PO) algorithms are a class of methods in Reinforcement Learning(RL; Sutton and Barto, 2018; Mannor et al., 2022) where the agent's policy is iteratively updated according to the (possibly preconditioned) gradient of the value function w.r.t.
arXiv.org Artificial Intelligence
Feb-15-2024