Undirected Networks
Coordinated Multi-Robot Exploration Under Communication Constraints Using Decentralized Markov Decision Processes
Matignon, Laetitia (Université de Caen Basse-Normandie) | Jeanpierre, Laurent (Université de Caen Basse-Normandie) | Mouaddib, Abdel-Illah (Université de Caen Basse-Normandie)
Recent works on multi-agent sequential decision making using decentralized partially observable Markov decision processes have been concerned with interaction-oriented resolution techniques and provide promising results. These techniques take advantage of local interactions and coordination. In this paper, we propose an approach based on an interaction-oriented resolution of decentralized decision makers. To this end, distributed value functions (DVF) have been used by decoupling the multi-agent problem into a set of individual agent problems. However existing DVF techniques assume permanent and free communication between the agents. In this paper, we extend the DVF methodology to address full local observability, limited share of information and communication breaks. We apply our new DVF in a real-world application consisting of multi-robot exploration where each robot computes locally a strategy that minimizes the interactions between the robots and maximizes the space coverage of the team even under communication constraints. Our technique has been implemented and evaluated in simulation and in real-world scenarios during a robotic challenge for the exploration and mapping of an unknown environment. Experimental results from real-world scenarios and from the challenge are given where our system was vice-champion.
Modeling Context Aware Dynamic Trust Using Hidden Markov Model
Liu, Xin (École Polytechnique Fédérale de Lausanne EPFL) | Datta, Anwitaman (Nanyang Technological University)
Modeling trust in complex dynamic environments is an important yet challenging issue since an intelligent agent may strategically change its behavior to maximize its profits. In thispaper, we propose a context aware trust model to predict dynamic trust by using a Hidden Markov Model (HMM) to model an agent's interactions. Although HMMs have already been applied in the past to model an agent's dynamic behavior to greatly improve the traditional static probabilistic trust approaches, most HMM based trust models only focus on outcomes of the past interactions without considering interaction context, which we believe, reflects immensely on the dynamic behavior or intent of an agent. Interaction contextual information is comprehensively studied and integrated into the model to more precisely approximate an agent's dynamic behavior. Evaluation using real auction data and synthetic data demonstrates the efficacy of our approach in comparison with previous state-of-the-art trust mechanisms.
Advances in Lifted Importance Sampling
Gogate, Vibhav (The University of Texas at Dallas) | Jha, Abhay (University of Washington) | Venugopal, Deepak (The University of Texas at Dallas)
We consider lifted importance sampling (LIS), a previously proposed approximate inference algorithm for statistical relational learning (SRL) models. LIS achieves substantial variance reduction over conventional importance sampling by using various lifting rules that take advantage of the symmetry in the relational representation. However, it suffers from two drawbacks. First, it does not take advantage of some important symmetries in the relational representation and may exhibit needlessly high variance on models having these symmetries. Second, it uses an uninformative proposal distribution which adversely affects its accuracy. We propose two improvements to LIS that address these limitations. First, we identify a new symmetry in SRL models and define a lifting rule for taking advantage of this symmetry. The lifting rule reduces the variance of LIS. Second, we propose a new, structured approach for constructing and dynamically updating the proposal distribution via adaptive sampling. We demonstrate experimentally that our new, improved LIS algorithm is substantially more accurate than the LIS algorithm.
A Tractable First-Order Probabilistic Logic
Domingos, Pedro (University of Washington) | Webb, William Austin (University of Washington)
Tractable subsets of first-order logic are a central topic in AI research. Several of these formalisms have been used as the basis for first-order probabilistic languages. However, these are intractable, losing the original motivation. Here we propose the first non-trivially tractable first-order probabilistic language. It is a subset of Markov logic, and uses probabilistic class and part hierarchies to control complexity. We call it TML (Tractable Markov Logic). We show that TML knowledge bases allow for efficient inference even when the corresponding graphical models have very high treewidth. We also show how probabilistic inheritance, default reasoning, and other inference patterns can be carried out in TML. TML opens up the prospect of efficient large-scale first-order probabilistic inference.
Exact Lifted Inference with Distinct Soft Evidence on Every Object
Bui, Hung B. (SRI International) | Huynh, Tuyen N. (SRI International) | Braz, Rodrigo de Salvo (SRI International)
The presence of non-symmetric evidence has been a barrier for the application of lifted inference since the evidence destroys the symmetry of the first-order probabilistic model. In the extreme case, if distinct soft evidence is obtained about each individual object in the domain then, often, all current exact lifted inference methods reduce to traditional inference at the ground level. However, it is of interest to ask whether the symmetry of the model itself before evidence is obtained can be exploited. We present new results showing that this is, in fact, possible. In particular, we show that both exact maximum a posteriori (MAP) and marginal inference can be lifted for the case of distinct soft evidence on a unary Markov Logic predicate. Our methods result in efficient procedures for MAP and marginal inference for a class of graphical models previously thought to be intractable.
Lifted MEU by Weighted Model Counting
Apsel, Udi (Ben-Gurion University of The Negev) | Brafman, Ronen I. (Ben-Gurion University of The Negev)
Recent work in the field of probabilistic inference demonstrated the efficiency of weighted model counting (WMC) enginesfor exact inference in propositional and, very recently, first order models. To date, these methods have not been applied to decision making models, propositional or first order, such as influence diagrams, and Markov decision networks (MDN). In this paper we show how this technique can be applied to such models. First, we show how WMC can be used to solve (propositional) MDNs. Then, we show how this can be extended to handle a first-order model — the Markov Logic Decision Network (MLDN). WMC offers two central benefits: it is a very simple and very efficient technique. This is particularly true for the first-order case, where the WMC approach is simpler conceptually, and, in many cases, more effective computationally than the existing methods for solving MLDNs via first-order variable elimination, or via propositionalization. We demonstrate the above empirically.
Efficient Approximate Value Iteration for Continuous Gaussian POMDPs
Berg, Jur van den (University of Utah) | Patil, Sachin (University of North Carolina at Chapel Hill) | Alterovitz, Ron (University of North Carolina at Chapel Hill)
We introduce a highly efficient method for solving continuous partially-observable Markov decision processes (POMDPs) in which beliefs can be modeled using Gaussian distributions over the state space. Our method enables fast solutions to sequential decision making under uncertainty for a variety of problems involving noisy or incomplete observations and stochastic actions. We present an efficient approach to compute locally-valid approximations to the value function over continuous spaces in time polynomial (O[n^4]) in the dimension n of the state space. To directly tackle the intractability of solving general POMDPs, we leverage the assumption that beliefs are Gaussian distributions over the state space, approximate the belief update using an extended Kalman filter (EKF), and represent the value function by a function that is quadratic in the mean and linear in the variance of the belief. Our approach iterates towards a linear control policy over the state space that is locally-optimal with respect to a user defined cost function, and is approximately valid in the vicinity of a nominal trajectory through belief space. We demonstrate the scalability and potential of our approach on problems inspired by robot navigation under uncertainty for state spaces of up to 128 dimensions.
POMDPs Make Better Hackers: Accounting for Uncertainty in Penetration Testing
Sarraute, Carlos (Core Security and ITBA) | Buffet, Olivier (INRIA and Université de Lorraine) | Hoffmann, Jörg (Saarland University)
Penetration Testing is a methodology for assessing network security, by generating and executing possible hacking attacks. Doing so automatically allows for regular and systematic testing. A key question is how to generate the attacks. This is naturally formulated as planning under uncertainty, i.e., under incomplete knowledge about the network configuration. Previous work uses classical planning, and requires costly pre-processes reducing this uncertainty by extensive application of scanning methods. By contrast, we herein model the attack planning problem in terms of partially observable Markov decision processes (POMDP). This allows to reason about the knowledge available, and to intelligently employ scanning actions as part of the attack. As one would expect, this accurate solution does not scale. We devise a method that relies on POMDPs to find good attacks on individual machines, which are then composed into an attack on the network as a whole. This decomposition exploits network structure to the extent possible, making targeted approximations (only) where needed. Evaluating this method on a suitably adapted industrial test suite, we demonstrate its effectiveness in both runtime and solution quality.
LRTDP Versus UCT for Online Probabilistic Planning
Kolobov, Andrey (University of Washington, Seattle) | Mausam, . (University of Washington, Seattle) | Weld, Daniel S. (University of Washington, Seattle)
UCT, the premier method for solving games such as Go, is also becoming the dominant algorithm for probabilistic planning. Out of the five solvers at the International Probabilistic Planning Competition (IPPC) 2011, four were based on the UCT algorithm. However, while a UCT-based planner, PROST, won the contest, an LRTDP-based system, Glutton, came in a close second, outperforming other systems derived from UCT. These results raise a question: what are the strengths and weaknesses of LRTDP and UCT in practice? This paper starts answering this question by contrasting the two approaches in the context of finite-horizon MDPs. We demonstrate that in such scenarios, UCT's lack of a sound termination condition is a serious practical disadvantage. In order to handle an MDP with a large finite horizon under a time constraint, UCT forces an expert to guess a non-myopic lookahead value for which it should be able to converge on the encountered states. Mistakes in setting this parameter can greatly hurt UCT's performance. In contrast, LRTDP's convergence criterion allows for an iterative deepening strategy. Using this strategy, LRTDP automatically finds the largest lookahead value feasible under the given time constraint. As a result, LRTDP has better performance and stronger theoretical properties. We present an online version of Glutton, named Gourmand, that illustrates this analysis and outperforms PROST on the set of IPPC-2011 problems.
Action Selection for MDPs: Anytime AO* Versus UCT
Bonet, Blai (Universidad Simon Bolivar) | Geffner, Hector (ICREA and Universitat Pompeu Fabra)
One of the natural approaches for selecting actions in very From this perspective, an algorithm like RTDP fails on two large state spaces is by performing a limited amount of grounds: first, RTDP does not appear to make best use of lookahead. In the contexts of discounted MDPs, Kearns, short time windows in large state spaces; second, and more Mansour, and Ng have shown that near to optimal actions importantly, RTDP can use admissible heuristics but not informed can be selected by considering a sampled lookahead tree that base policies. On the other hand, algorithms like Policy is sufficiently sparse, whose size depends on the discount Iteration (Howard 1971), deliver all of these features except factor and the suboptimality bound but not on the number of one: they are exhaustive, and thus even to get started, problem states (Kearns, Mansour, and Ng 1999). The UCT they need vectors with the size of the state space. At the algorithm (Kocsis and Szepesvári 2006) is a version of this same time, while there are non-exhaustive versions of (asynchronous) form of Monte Carlo planning, where the lookahead trees Value Iteration such as RTDP, there are no similar are not grown depth-first but'best-first', following a selection'focused' versions of Policy Iteration ensuring anytime optimality.