Last-Iterate Convergence of General Parameterized Policies in Constrained MDPs

Aug-21-2024–arXiv.org Artificial Intelligence

Constrained Markov Decision Process (CMDP) is a classical framework where an agent repeatedly interacts with an unknown environment to maximize the cumulative discounted rewards while simultaneously ensuring that the cumulative observed costs are within a pre-defined boundary. It finds its application in a multitude of practical scenarios. For example, consider an autonomous vehicle that attempts to reach its destination via the shortest-time route without violating traffic rules or a corporate leader who aims to maximize revenue without crossing a monetary budget. In these cases, any departure from the boundary set by the predefined rules can be signaled by a cost while the progress towards the desired objective can be indicated by a reward. Finding an optimal policy to navigate an unknown CMDP is a difficult task. Nevertheless, several recent articles have proposed algorithms to solve this challenging problem with optimality guarantees.

constraint violation, inequality, sample complexity, (10 more...)

arXiv.org Artificial Intelligence

Aug-21-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.04)
- Europe > Montenegro (0.04)
- Asia
  - Middle East > Jordan (0.04)
  - India > Uttar Pradesh
    - Kanpur (0.04)

Genre:
- Research Report (0.64)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Machine Learning
    - Statistical Learning (0.68)
    - Learning Graphical Models > Undirected Networks
      - Markov Models (0.34)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found