rho-POMDPs have Lipschitz-Continuous epsilon-Optimal Value Functions

Fehr, Mathieu, Buffet, Olivier, Thomas, Vincent, Dibangoye, Jilles

Feb-14-2020, 19:11:34 GMT–Neural Information Processing Systems

Many state-of-the-art algorithms for solving Partially Observable Markov Decision Processes (POMDPs) rely on turning the problem into a "fully observable" problem--a belief MDP--and exploiting the piece-wise linearity and convexity (PWLC) of the optimal value function in this new state space (the belief simplex). This approach has been extended to solving ρ-POMDPs--i.e., for information-oriented criteria--when the reward ρ is convex in . General ρ-POMDPs can also be turned into "fully observable" problems, but with no means to exploit the PWLC property. In this paper, we focus on POMDPs and ρ-POMDPs with λ ρ -Lipschitz reward function, and demonstrate that, for finite horizons, the optimal value function is Lipschitz-continuous. Then, value function approximators are proposed for both upper- and lower-bounding the optimal value function, which are shown to provide uniformly improvable bounds.

artificial intelligence, lipschitz-continuous epsilon-optimal value function, machine learning, (1 more...)

Neural Information Processing Systems

Feb-14-2020, 19:11:34 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)