A Sample-Efficient Algorithm for Episodic Finite-Horizon MDP with Constraints

Kalagarla, Krishna C., Jain, Rahul, Nuzzo, Pierluigi

arXiv.org Artificial Intelligence 

Constrained Markov Decision Processes (CMDPs) formalize sequential decision-making problems whose objective is to minimize a cost function while satisfying constraints on various cost functions. In this paper, we consider the setting of episodic fixed-horizon CMDPs. We propose an online algorithm which leverages the linear programming formulation of finitehorizon CMDP for repeated optimistic planning to provide a probably approximately correct (PAC) guarantee on the number of episodes needed to ensure an ǫ-optimal policy, i.e., with resulting objective value within ǫ of the optimal value and satisfying the constraints within ǫ-tolerance, with probability at least 1 δ. S, the number of episodes needed have a linear dependence on the state and action space sizes S and A, respectively, and quadratic dependence on the time horizon H. Markov decision processes (MDPs) [1] offer a natural framework to express sequential decision-making problems and reason about autonomous system behaviors. However, the single cost objective of a traditional MDP formulation may fall short of fully capturing problems with multiple conflicting objectives and additional constraints that must be satisfied.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found