Last-Iterate Convergence of General Parameterized Policies in Constrained MDPs
Mondal, Washim Uddin, Aggarwal, Vaneet
–arXiv.org Artificial Intelligence
Constrained Markov Decision Process (CMDP) is a classical framework where an agent repeatedly interacts with an unknown environment to maximize the cumulative discounted rewards while simultaneously ensuring that the cumulative observed costs are within a pre-defined boundary. It finds its application in a multitude of practical scenarios. For example, consider an autonomous vehicle that attempts to reach its destination via the shortest-time route without violating traffic rules or a corporate leader who aims to maximize revenue without crossing a monetary budget. In these cases, any departure from the boundary set by the predefined rules can be signaled by a cost while the progress towards the desired objective can be indicated by a reward. Finding an optimal policy to navigate an unknown CMDP is a difficult task. Nevertheless, several recent articles have proposed algorithms to solve this challenging problem with optimality guarantees.
arXiv.org Artificial Intelligence
Aug-21-2024
- Country:
- North America > United States (0.04)
- Europe > Montenegro (0.04)
- Asia
- Middle East > Jordan (0.04)
- India > Uttar Pradesh
- Kanpur (0.04)
- Genre:
- Research Report (0.64)