Decentralized POMDP is an expressive model for multi-agent planning. Finite-state controllers (FSCs)---often used to represent policies for infinite-horizon problems---offer a compact, simple-to-execute policy representation. We exploit novel connections between optimizing decentralized FSCs and the dual linear program for MDPs. Consequently, we describe a dual mixed integer linear program (MIP) for optimizing deterministic FSCs. We exploit the Dec-POMDP structure to devise a compact MIP and formulate constraints that result in policies executable in partially-observable decentralized settings. We show analytically that the dual formulation can also be exploited within the expectation maximization (EM) framework to optimize stochastic FSCs. The resulting EM algorithm can be implemented by solving a sequence of linear programs, without requiring expensive message-passing over the Dec-POMDP DBN. We also present an efficient technique for policy improvement based on a weighted entropy measure. Compared with state-of-the-art FSC methods, our approach offers over an order-of-magnitude speedup, while producing similar or better solutions.
Applications such as robot control and wireless communication require planning under uncertainty. Partially observable Markov decision processes (POMDPs) plan policies for single agents under uncertainty and their decentralized versions (DEC-POMDPs) find a policy for multiple agents. The policy in infinite-horizon POMDP and DEC-POMDP problems has been represented as finite state controllers (FSCs). We introduce a novel class of periodic FSCs, composed of layers connected only to the previous and next layer. Our periodic FSC method finds a deterministic finite-horizon policy and converts it to an initial periodic infinite-horizon policy. This policy is optimized by a new infinite-horizon algorithm to yield deterministic periodic policies, and by a new expectation maximization algorithm to yield stochastic periodic policies. Our method yields better results than earlier planning methods and can compute larger solutions than with regular FSCs.
A good human videographer is adept at selecting the best vantage points from which to film a subject. The aim of our research is to create an autonomous quadcopter that is similarly skilled at capturing good photographs of a moving subject. Due to their small size and mobility, quadcopters are well-suited to act as videographers and are capable of shooting from locations that are unreachable for a human. This paper evaluates the performance of two vantage point selection strategies: 1) a reactive controller that tracks the subject's head pose and 2) combining the reactive system with a POMDP planner that considers the target's movement intentions. Incorporating a POMDP planner into the system results in more stable footage and less quadcopter motion.
Existing controller-based approaches for centralized and decentralized POMDPs are based on automata with output known as Moore machines. In this paper, we show that several advantages can be gained by utilizing another type of automata, the Mealy machine. Mealy machines are more powerful than Moore machines, provide a richer structure that can be exploited by solution methods, and can be easily incorporated into current controller-based approaches. To demonstrate this, we adapted some existing controller-based algorithms to use Mealy machines and obtained results on a set of benchmark domains. The Mealy-based approach always outperformed the Moore-based approach and often outperformed the state-of-the-art algorithms for both centralized and decentralized POMDPs. These findings provide fresh and general insights for the improvement of existing algorithms and the development of new ones.
Consider an agent that has a large supply of eggs and whose goal is to get three good eggs and no bad ones into one of two bowls. The eggs can be either good or bad, and at any time the agent can find out whether a bowl contains a bad egg by inspecting it (Levesque 1996). The task is to devise a controller for achieving the goal. This is a typical problem of planning with incomplete information, a situation that is very common for agents that must act in the real world (Moore 1985). In AI there have been two approaches to this problem. On the one hand, proposals focused on extensions of Strips planning languages and algorithms; (e.g., (Etzioni et al. 1992; Collins & Pryor 1995)), on the other, proposals focused on formalizations of actions, knowledge and their interaction (e.g., (Moore 1985; Levesque 1996)). In eral the two approaches haven't come together yet, and thus planners with incomplete information that both work and have a solid theoretical foundation are not common. In this paper we attack this problem from a different perspective.