Confident Natural Policy Gradient for Local Planning in q
–Neural Information Processing Systems
The constrained Markov decision process (CMDP) framework emerges as an important reinforcement learning approach for imposing safety or other critical objectives while maximizing cumulative reward. However, the current understanding of how to learn efficiently in a CMDP environment with a potentially infinite number of states remains under investigation, particularly when function approximation is applied to the value functions.
Neural Information Processing Systems
May-25-2025, 08:41:22 GMT
- Country:
- Europe > United Kingdom
- England (0.14)
- North America
- Canada > Alberta (0.14)
- United States > California (0.14)
- Europe > United Kingdom
- Genre:
- Research Report > Experimental Study (0.92)
- Industry:
- Information Technology (0.46)
- Technology: