Temporal Regularization for Markov Decision Process

Pierre Thodoroff, Audrey Durand, Joelle Pineau, Doina Precup

Neural Information Processing Systems 

Yetinreinforcementlearning,duetothenatureofthe Bellman equation, there isanopportunity toalsoexploit temporal regularization based on smoothness in value estimates over trajectories. This paper explores a class of methods for temporal regularization.