Pajarinen, Joni K., Peltonen, Jaakko

Applications such as robot control and wireless communication require planning under uncertainty. Partially observable Markov decision processes (POMDPs) plan policies for single agents under uncertainty and their decentralized versions (DEC-POMDPs) find a policy for multiple agents. The policy in infinite-horizon POMDP and DEC-POMDP problems has been represented as finite state controllers (FSCs). We introduce a novel class of periodic FSCs, composed of layers connected only to the previous and next layer. Our periodic FSC method finds a deterministic finite-horizon policy and converts it to an initial periodic infinite-horizon policy. This policy is optimized by a new infinite-horizon algorithm to yield deterministic periodic policies, and by a new expectation maximization algorithm to yield stochastic periodic policies. Our method yields better results than earlier planning methods and can compute larger solutions than with regular FSCs.

For decision-theoretic planning problems with an indefinite horizon, plan execution terminates after a finite number of steps with probability one, but the number of steps until termination (i.e., the horizon) is uncertain and unbounded. In the traditional approach to modeling such problems, called a stochastic shortest-path problem, plan execution terminates when a particular state is reached, typically a goal state. We consider a model in which plan execution terminates when a stopping action is taken. We show that an action-based model of termination has several advantages for partially observable planning problems. It does not require a goal state to be fully observable; it does not require achievement of a goal state to be guaranteed; and it allows a proper policy to be found more easily. This framework allows many partially observable planning problems to be modeled in a more realistic way that does not require an artificial discount factor.

Lusena, C., Goldsmith, J., Mundhenk, M.

We show that for several variations of partially observable Markov decision processes, polynomial-time algorithms for finding control policies are unlikely to or simply don't have guarantees of finding policies within a constant factor or a constant summand of optimal. Here ``unlikely'' means ``unless some complexity classes collapse,'' where the collapses considered are P=NP, P=PSPACE, or P=EXP. Until or unless these collapses are shown to hold, any control-policy designer must choose between such performance guarantees and efficient computation.

Pajarinen, Joni Kristian (Aalto University) | Peltonen, Jaakko Tapani (Aalto University)

Decentralized partially observable Markov decision processes (DEC-POMDPs) are used to plan policies for multiple agents that must maximize a joint reward function but do not communicate with each other. The agents act under uncertainty about each other and the environment. This planning task arises in optimization of wireless networks, and other scenarios where communication between agents is restricted by costs or physical limits. DEC-POMDPs are a promising solution, but optimizing policies quickly becomes computationally intractable when problem size grows. Factored DEC-POMDPs allow large problems to be described in compact form, but have the same worst case complexity as non-factored DEC-POMDPs. We propose an efficient optimization algorithm for large factored infinite-horizon DEC-POMDPs. We formulate expectation-maximization based optimization into a new form, where complexity can be kept tractable by factored approximations. Our method performs well, and it can solve problems with more agents and larger state spaces than state of the art DEC-POMDP methods. We give results for factored infinite-horizon DEC-POMDP problems with up to 10 agents.

Chatterjee, Krishnendu (Institute of Science and Technology, Austria) | Chmelík, Martin (Institute of Science and Technology, Austria)

DEC-POMDPs extend POMDPs to a multi-agent setting, where several agents operate in an uncertain environment independently to achieve a joint objective. DEC-POMDPs have been studied with finite-horizon and infinite-horizon discounted-sum objectives, and there exist solvers both for exact and approximate solutions. In this work we consider Goal-DEC-POMDPs, where given a set of target states, the objective is to ensure that the target set is reached with minimal cost.We consider the indefinite-horizon (infinite-horizon with either discounted-sum, or undiscounted-sum, where absorbing goal states have zero-cost) problem. We present a new and novel method to solve the problem that extends methods for finite-horizon DEC-POMDPs and the RTDP-Bel approach for POMDPs. We present experimental results on several examples, and show that our approach presents promising results.