Structure Matters: Dynamic Policy Gradient

Neural Information Processing Systems 

In this work, we study $\gamma$-discounted infinite-horizon tabular Markov decision processes (MDPs) and introduce a framework called dynamic policy gradient (DynPG).