
Yetinreinforcementlearning,duetothenatureofthe Bellman equation, there isanopportunity toalsoexploit temporal regularization based on smoothness in value estimates over trajectories. This paper explores a class of methods for temporal regularization.