A Regularized Approach to Sparse Optimal Policy in Reinforcement Learning
Yang, Wenhao, Li, Xiang, Zhang, Zhihua
–Neural Information Processing Systems
We propose and study a general framework for regularized Markov decision processes (MDPs) where the goal is to find an optimal policy that maximizes the expected discounted total reward plus a policy regularization term. The extant entropy-regularized MDPs can be cast into our framework. Moreover, under our framework, many regularization terms can bring multi-modality and sparsity, which are potentially useful in reinforcement learning. In particular, we present sufficient and necessary conditions that induce a sparse optimal policy. We also conduct a full mathematical analysis of the proposed regularized MDPs, including the optimality condition, performance error, and sparseness control.
Neural Information Processing Systems
Mar-18-2020, 22:49:15 GMT
- Technology: