A Sample-Efficient Algorithm for Episodic Finite-Horizon MDP with Constraints
Kalagarla, Krishna C., Jain, Rahul, Nuzzo, Pierluigi
–arXiv.org Artificial Intelligence
Constrained Markov Decision Processes (CMDPs) formalize sequential decision-making problems whose objective is to minimize a cost function while satisfying constraints on various cost functions. In this paper, we consider the setting of episodic fixed-horizon CMDPs. We propose an online algorithm which leverages the linear programming formulation of finitehorizon CMDP for repeated optimistic planning to provide a probably approximately correct (PAC) guarantee on the number of episodes needed to ensure an ǫ-optimal policy, i.e., with resulting objective value within ǫ of the optimal value and satisfying the constraints within ǫ-tolerance, with probability at least 1 δ. S, the number of episodes needed have a linear dependence on the state and action space sizes S and A, respectively, and quadratic dependence on the time horizon H. Markov decision processes (MDPs) [1] offer a natural framework to express sequential decision-making problems and reason about autonomous system behaviors. However, the single cost objective of a traditional MDP formulation may fall short of fully capturing problems with multiple conflicting objectives and additional constraints that must be satisfied.
arXiv.org Artificial Intelligence
Sep-23-2020
- Country:
- North America > United States
- New York > New York County
- New York City (0.04)
- California > Los Angeles County
- Los Angeles (0.14)
- Long Beach (0.04)
- New York > New York County
- Asia > Middle East
- Jordan (0.04)
- North America > United States
- Genre:
- Research Report (0.40)