A Sample-Efficient Algorithm for Episodic Finite-Horizon MDP with Constraints

Kalagarla, Krishna C., Jain, Rahul, Nuzzo, Pierluigi

Sep-23-2020–arXiv.org Artificial Intelligence

Constrained Markov Decision Processes (CMDPs) formalize sequential decision-making problems whose objective is to minimize a cost function while satisfying constraints on various cost functions. In this paper, we consider the setting of episodic fixed-horizon CMDPs. We propose an online algorithm which leverages the linear programming formulation of finitehorizon CMDP for repeated optimistic planning to provide a probably approximately correct (PAC) guarantee on the number of episodes needed to ensure an ǫ-optimal policy, i.e., with resulting objective value within ǫ of the optimal value and satisfying the constraints within ǫ-tolerance, with probability at least 1 δ. S, the number of episodes needed have a linear dependence on the state and action space sizes S and A, respectively, and quadratic dependence on the time horizon H. Markov decision processes (MDPs) [1] offer a natural framework to express sequential decision-making problems and reason about autonomous system behaviors. However, the single cost objective of a traditional MDP formulation may fall short of fully capturing problems with multiple conflicting objectives and additional constraints that must be satisfied.

ln 4, log 2, probability, (15 more...)

arXiv.org Artificial Intelligence

Sep-23-2020

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - New York > New York County
    - New York City (0.04)
  - California > Los Angeles County
    - Los Angeles (0.14)
    - Long Beach (0.04)
- Asia > Middle East
  - Jordan (0.04)

Genre:
- Research Report (0.40)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Machine Learning
    - Reinforcement Learning (1.00)
    - Learning Graphical Models > Undirected Networks
      - Markov Models (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found