Temporal Regularization for Markov Decision Process
Thodoroff, Pierre, Durand, Audrey, Pineau, Joelle, Precup, Doina
–Neural Information Processing Systems
Several applications of Reinforcement Learning suffer from instability due to high variance. This is especially prevalent in high dimensional domains. Regularization is a commonly used technique in machine learning to reduce variance, at the cost of introducing some bias. Most existing regularization techniques focus on spatial (perceptual) regularization. Yet in reinforcement learning, due to the nature of the Bellman equation, there is an opportunity to also exploit temporal regularization based on smoothness in value estimates over trajectories.
Neural Information Processing Systems
Feb-14-2020, 08:43:13 GMT