A Regularized Approach to Sparse Optimal Policy in Reinforcement Learning
Wenhao Yang, Xiang Li, Zhihua Zhang
–Neural Information Processing Systems
We propose and study a general framework for regularized Markov decision processes (MDPs) where the goal is to find an optimal policy that maximizes the expected discounted total reward plus a policy regularization term.
Neural Information Processing Systems
Oct-2-2025, 14:46:58 GMT
- Country:
- Asia > China
- North America
- Canada (0.04)
- United States > New Jersey
- Mercer County > Princeton (0.04)
- Technology: