Probabilistic Goal Markov Decision Processes
Xu, Huan (National University of Singapore) | Mannor, Shie (Technion)
In contrast to the studied in single-period optimization [Miller and Wagner, standard approach that studies the expected performance, 1965; Prékopa, 1970]. However, little has been done in we consider the policy that maximizes the context of sequential decision problem including MDPs. the probability of achieving a predetermined target The standard approaches in risk-averse MDPs include maximization performance, a criterion we term probabilistic of expected utility function [Bertsekas, 1995], goal Markov decision processes. We show that and optimization of a coherent risk measure [Riedel, 2004; this problem is NPhard, but can be solved using a Le Tallec, 2007]. Both approaches lead to formulations that pseudo-polynomial algorithm. We further consider can not be solved in polynomial time, except for special a variant dubbed "chance-constraint Markov decision cases including exponential utility function [Chung and Sobel, problems," that treats the probability of achieving 1987], piecewise linear utility function with a single target performance as a constraint instead of the break down point [Liu and Koenig, 2005], and risk measures maximizing objective. This variant is NPhard, but that can be reduced to robust MDPs satisfying the socalled can be solved in pseudo-polynomial time.
Jul-19-2011
- Country:
- North America > United States
- New York (0.04)
- Asia
- Middle East > Israel (0.04)
- Singapore > Central Region
- Singapore (0.04)
- North America > United States
- Technology: