Supplementary Material: Continuous MDP Homomorphisms and Homomorphic Policy Gradient

Neural Information Processing Systems 

RL algorithms can be broadly divided into value-based and policy gradient (PG) methods.