Undirected Networks
Map Matching with Inverse Reinforcement Learning
Osogami, Takayuki (IBM) | Raymond, Rudy (IBM)
We study map-matching, the problem of estimating the route that is traveled by a vehicle, where the points observed with the Global Positioning System are available. A state-of-the-art approach for this problem is a Hidden Markov Model (HMM). We propose a particular transition probability between latent road segments by the use of the number of turns in addition to the travel distance between the latent road segments. We use inverse reinforcement learning to estimate the importance of the number of turns relative to the travel distance. This estimated importance is incorporated in the transition probability of the HMM. We show, through numerical experiments, that the error of map-matching can be reduced substantially with the proposed transition probability.
Run-Time Improvement of Point-Based POMDP Policies
Wang, Minlue (University of Birmingham) | Dearden, Richard (University of Birmingham)
The most successful recent approaches to partially observable Markov decision problem (POMDP) solving have largely been point-based approximation algorithms. These work by selecting a finite number of belief points, computing alpha-vectors for those points, and using the resulting policy everywhere. However, if during execution the belief state is far from the points, there is no guarantee that the policy will be good. This case occurs either when the points are chosen poorly or there are too few points to capture the whole optimal policy, for example in domains where there are many low probability transitions, such as faults or exogenous events. In this paper we explore the use of an on-line plan repair approach to overcome this difficulty. The idea is to split computation between off-line plan creation and, if necessary, on-line plan repair. We evaluate a variety of heuristics used to determine when plan repair might be useful, and then repair the plan by sampling a small number of additional belief points and recomputing the policy. We show in several domains that the approach is more effective than either off-line planning alone even with much more computation time, or a purely on-line planning based on forward search. We also show that the overhead of checking the heuristics is very small when replanning is unnecessary.
Interactive POMDP Lite: Towards Practical Planning to Predict and Exploit Intentions for Interacting with Self-Interested Agents
Hoang, Trong Nghia (National University of Singapore) | Low, Kian Hsiang (National University of Singapore)
A key challenge in non-cooperative multi-agent systems is that of developing efficient planning algorithms for intelligent agents to interact and perform effectively among boundedly rational, self-interested agents (e.g., humans). The practicality of existing works addressing this challenge is being undermined due to either the restrictive assumptions of the other agents' behavior, the failure in accounting for their rationality, or the prohibitively expensive cost of modeling and predicting their intentions. To boost the practicality of research in this field, we investigate how intention prediction can be efficiently exploited and made practical in planning, thereby leading to efficient intention-aware planning frameworks capable of predicting the intentions of other agents and acting optimally with respect to their predicted intentions. We show that the performance losses incurred by the resulting planning policies are linearly bounded by the error of intention prediction. Empirical evaluations through a series of stochastic games demonstrate that our policies can achieve better and more robust performance than the state-of-the-art algorithms.
Isomorph-Free Branch and Bound Search for Finite State Controllers
Grzes, Marek (University of Waterloo) | Poupart, Pascal (University of Waterloo) | Hoey, Jesse (University of Waterloo)
The recent proliferation of smart-phones and other wearable devices has lead to a surge of new mobile applications. Partially observable Markov decision processes provide a natural framework to design applications that continuously make decisions based on noisy sensor measurements. However, given the limited battery life, there is a need to minimize the amount of online computation. This can be achieved by compiling a policy into a finite state controller since there is no need for belief monitoring or online search. In this paper, we propose a new branch and bound technique to search for a good controller. In contrast to many existing algorithms for controllers, our search technique is not subject to local optima. We also show how to reduce the amount of search by avoiding the enumeration of isomorphic controllers and by taking advantage of suitable upper and lower bounds. The approach is demonstrated on several benchmark problems as well as a smart-phone application to assist persons with Alzheimer's to wayfind.
Online Expectation Maximization for Reinforcement Learning in POMDPs
Liu, Miao (Duke University) | Liao, Xuejun (Duke University) | Carin, Lawrence (Duke University)
We present online nested expectation maximization for model-free reinforcement learning in a POMDP. The algorithm evaluates the policy only in the current learning episode, discarding the episode after the evaluation and memorizing the sufficient statistic, from which the policy is computed in closed-form. As a result, the online algorithm has a time complexity O ( n ) and a memory complexity O (1), compared to O ( n 2 ) and O ( n ) for the corresponding batch-mode algorithm, where $n$ is the number of learning episodes. The online algorithm, which has a provable convergence, is demonstrated on five benchmark POMDP problems.
Automated Generation of Interaction Graphs for Value-Factored Dec-POMDPs
Yeoh, William (New Mexico State University) | Kumar, Akshat (IBM Research) | Zilberstein, Shlomo (University of Massachusetts, Amherst)
The Decentralized Partially Observable Markov Decision Process (Dec-POMDP) is a powerful model for multi-agent planning under uncertainty, but its applicability is hindered by its high complexity -- solving Dec-POMDPs optimally is NEXP-hard. Recently, Kumar et al. introduced the Value Factorization (VF) framework, which exploits decomposable value functions that can be factored into subfunctions. This framework has been shown to be a generalization of several models that leverage sparse agent interactions such as TI-Dec-MDPs, ND-POMDPs and TD-POMDPs. Existing algorithms for these models assume that the interaction graph of the problem is given. In this paper, we introduce three algorithms to automatically generate interaction graphs for models within the VF framework and establish lower and upper bounds on the expected reward of an optimal joint policy. We illustrate experimentally the benefits of these techniques for sensor placement in a decentralized tracking application.
Monte-Carlo Expectation Maximization for Decentralized POMDPs
Wu, Feng (University of Southampton) | Zilberstein, Shlomo (University of Massachusetts Amherst) | Jennings, Nicholas R. (University of Southampton)
We address two significant drawbacks of state-of-the-art solvers of decentralized POMDPs (DEC-POMDPs): the reliance on complete knowledge of the model and limited scalability as the complexity of the domain grows. We extend a recently proposed approach for solving DEC-POMDPs via a reduction to the maximum likelihood problem, which in turn can be solved using EM. We introduce a model-free version of this approach that employs Monte-Carlo EM(MCEM). While a naive implementation of MCEM is inadequate in multi-agent settings, we introduce several improvements in sampling that produce high-quality results on a variety of DEC-POMDP benchmarks, including large problems with thousands of agents.
Bimodal Switching for Online Planning in Multiagent Settings
Sonu, Ekhlas (University of Georgia) | Doshi, Prashant (University of Georgia)
We present a bimodal method for online planning in partially observable multiagent settings as formalized by a finitely-nested interactive partially observable Markov decision process (I-POMDP). An agent planning in an environment shared with another updates beliefs both over the physical state and the other agents' models. In problems where we do not observe other's action explicitly but must infer it from sensing its effect on the state, observations are more informative about the other when the belief over the state space has reduced uncertainty. For typical, uncertain initial beliefs, we model the agent as if it were acting alone and utilize fast online planning for POMDPs. Subsequently, the agent switches to online planning in multiagent settings. We maintain tight lower and upper bounds at each step, and switch over when the difference between them reduces to less than ฮต.
Sufficient Plan-Time Statistics for Decentralized POMDPs
Oliehoek, Frans Adriaan (Maastricht University)
Optimal decentralized decision making in a team of cooperative agents as formalized by decentralized POMDPs is a notoriously hard problem. A major obstacle is that the agents do not have access to a sufficient statistic during execution, which means that they need to base their actions on their histories of observations. A consequence is that even during off-line planning the choice of decision rules for different stages is tightly interwoven: decisions of earlier stages affect how to act optimally at later stages, and the optimal value function for a stage is known to have a dependence on the decisions made up to that point. This paper makes a contribution to the theory of decentralized POMDPs by showing how this dependence on the 'past joint policy' can be replaced by a sufficient statistic. These results are extended to the case of k-step delayed communication. The paper investigates the practical implications, as well as the effectiveness of a new pruning technique for MAA* methods, in a number of benchmark problems and discusses future avenues of research opened by these contributions.
An Intelligent Broker Agent for Energy Trading: An MDP Approach
Kuate, Rodrigue Talla (Aston University) | He, Minghua (Aston University) | Chli, Maria (Aston University) | Wang, Hai H. (Aston University)
This paper details the development and evaluation of AstonTAC, an energy broker that successfully participated in the 2012 Power Trading Agent Competition (Power TAC). AstonTAC buys electrical energy from the wholesale market and sells it in the retail market. The main focus of the paper is on the broker's bidding strategy in the wholesale market. In particular, it employs Markov Decision Processes (MDP) to purchase energy at low prices in a day-ahead power wholesale market, and keeps energy supply and demand balanced. Moreover, we explain how the agent uses Non-Homogeneous Hidden Markov Model (NHHMM) to forecast energy demand and price. An evaluation and analysis of the 2012 Power TAC finals show that AstonTAC is the only agent that can buy energy at low price in the wholesale market and keep energy imbalance low.