Undirected Networks
Lower Bounding Klondike Solitaire with Monte-Carlo Planning
Bjarnason, Ronald (Oregon State University) | Fern, Alan (Oregon State University) | Tadepalli, Prasad (Oregon State University)
Despite its ubiquitous presence, very little is known about the odds of winning the simple card game of Klondike Solitaire. The main goal of this paper is to investigate the use of probabilistic planning to shed light on this issue. Unfortunatley, most probabilistic planning techniques are not well suited for Klondike due to the difficulties of representing the domain in standard planning languages and the complexity of the required search. Klondike thus serves as an interesting addition to the complement of probabilistic planning domains. In this paper, we study Klondike using several sampling-based planning approaches including UCT, hindsight optimization, and sparse sampling, and establish lower bounds on their performance. We also introduce novel combinations of these approaches and evaluate them in Klondike. We provide a theoretical bound on the sample complexity of a method that naturally combines sparse sampling and UCT. Our results demonstrate that there is a policy that within tight confidence intervals wins over 35% of Klondike games. This result is the first reported lower bound of an optimal Klondike policy.
Incremental Policy Generation for Finite-Horizon DEC-POMDPs
Amato, Christopher (University of Massachusetts, Amherst) | Dibangoye, Jilles Steeve (Laval University) | Zilberstein, Shlomo (University of Massachusetts, Amherst)
Solving multiagent planning problems modeled as DEC-POMDPs is an important challenge. These models are often solved by using dynamic programming, but the high resource usage of current approaches results in limited scalability. To improve the efficiency of dynamic programming algorithms, we propose a new backup algorithm that is based on a reachability analysis of the state space. This method, which we call incremental policy generation, can be used to produce an optimal solution for any possible initial state or further scalability can be achieved by making use of a known start state. When incorporated into the optimal dynamic programming algorithm, our experiments show that planning horizon can be increased due to a marked reduction in resource consumption. This approach also fits nicely with approximate dynamic programming algorithms. To demonstrate this, we incorporate it into the state-of-the-art PBIP algorithm and show significant performance gains. The results suggest that the performance of other dynamic programming algorithms for DEC-POMDPs could be similarly improved by integrating the incremental policy generation approach.
Regret Bounds for Opportunistic Channel Access
Filippi, Sarah, Cappé, Olivier, Garivier, Aurélien
We consider the task of opportunistic channel access in a primary system composed of independent Gilbert-Elliot channels where the secondary (or opportunistic) user does not dispose of a priori information regarding the statistical characteristics of the system. It is shown that this problem may be cast into the framework of model-based learning in a specific class of Partially Observed Markov Decision Processes (POMDPs) for which we introduce an algorithm aimed at striking an optimal tradeoff between the exploration (or estimation) and exploitation requirements. We provide finite horizon regret bounds for this algorithm as well as a numerical evaluation of its performance in the single channel model as well as in the case of stochastically identical channels.
Optimal Value of Information in Graphical Models
Many real-world decision making tasks require us to choose among several expensive observations. In a sensor network, for example, it is important to select the subset of sensors that is expected to provide the strongest reduction in uncertainty. In medical decision making tasks, one needs to select which tests to administer before deciding on the most effective treatment. It has been general practice to use heuristic-guided procedures for selecting observations. In this paper, we present the first efficient optimal algorithms for selecting observations for a class of probabilistic graphical models. For example, our algorithms allow to optimally label hidden variables in Hidden Markov Models (HMMs). We provide results for both selecting the optimal subset of observations, and for obtaining an optimal conditional observation plan. Furthermore we prove a surprising result: In most graphical models tasks, if one designs an efficient algorithm for chain graphs, such as HMMs, this procedure can be generalized to polytree graphical models. We prove that the optimizing value of information is $NP^{PP}$-hard even for polytrees. It also follows from our results that just computing decision theoretic value of information objective functions, which are commonly used in practice, is a #P-complete problem even on Naive Bayes models (a simple special case of polytrees). In addition, we consider several extensions, such as using our algorithms for scheduling observation selection for multiple sensors. We demonstrate the effectiveness of our approach on several real-world datasets, including a prototype sensor network deployment for energy conservation in buildings.
Efficient Markov Network Structure Discovery Using Independence Tests
Bromberg, F., Margaritis, D., Honavar, V.
We present two algorithms for learning the structure of a Markov network from data: GSMN* and GSIMN. Both algorithms use statistical independence tests to infer the structure by successively constraining the set of structures consistent with the results of these tests. Until very recently, algorithms for structure learning were based on maximum likelihood estimation, which has been proved to be NP-hard for Markov networks due to the difficulty of estimating the parameters of the network, needed for the computation of the data likelihood. The independence-based approach does not require the computation of the likelihood, and thus both GSMN* and GSIMN can compute the structure efficiently (as shown in our experiments). GSMN* is an adaptation of the Grow-Shrink algorithm of Margaritis and Thrun for learning the structure of Bayesian networks. GSIMN extends GSMN* by additionally exploiting Pearl's well-known properties of the conditional independence relation to infer novel independences from known ones, thus avoiding the performance of statistical tests to estimate them. To accomplish this efficiently GSIMN uses the Triangle theorem, also introduced in this work, which is a simplified version of the set of Markov axioms. Experimental comparisons on artificial and real-world data sets show GSIMN can yield significant savings with respect to GSMN*, while generating a Markov network with comparable or in some cases improved quality. We also compare GSIMN to a forward-chaining implementation, called GSIMN-FCH, that produces all possible conditional independences resulting from repeatedly applying Pearl's theorems on the known conditional independence tests. The results of this comparison show that GSIMN, by the sole use of the Triangle theorem, is nearly optimal in terms of the set of independences tests that it infers.
Estimating the Impact of Public and Private Strategies for Controlling an Epidemic: A Multi-Agent Approach
Barrett, Christopher L. (Virginia Polytechnic Institute and State University) | Bisset, Keith (Virginia Polytechnic Institute and State University) | Leidig, Jonathan (Virginia Polytechnic Institute and State University) | Marathe, Achla (Virginia Polytechnic Institute and State University) | Marathe, Madhav (Virginia Polytechnic Institute and State University)
This paper describes a novel approach based on a combination of techniques in AI, parallel computing, and network science to address an important problem in social sciences and public health: planning and responding in the event of epidemics. Spread of infectious disease is an important societal problem -- human behavior, social networks, and the civil infrastructures all play a crucial role in initiating and controlling such epidemic processes. We specifically consider the economic and social effects of realistic interventions proposed and adopted by public health officials and behavioral changes of private citizens in the event of a ``flu-like'' epidemic. Our results provide new insights for developing robust public policies that can prove useful for epidemic planning.
A Bilinear Programming Approach for Multiagent Planning
Multiagent planning and coordination problems are common and known to be computationally hard. We show that a wide range of two-agent problems can be formulated as bilinear programs. We present a successive approximation algorithm that significantly outperforms the coverage set algorithm, which is the state-of-the-art method for this class of multiagent problems. Because the algorithm is formulated for bilinear programs, it is more general and simpler to implement. The new algorithm can be terminated at any time and-unlike the coverage set algorithm-it facilitates the derivation of a useful online performance bound. It is also much more efficient, on average reducing the computation time of the optimal solution by about four orders of magnitude. Finally, we introduce an automatic dimensionality reduction method that improves the effectiveness of the algorithm, extending its applicability to new domains and providing a new way to analyze a subclass of bilinear programs.
Nonmyopic Adaptive Informative Path Planning for Multiple Robots
Singh, Amarjeet (University of California Los Angeles) | Krause, Andreas (Caltech) | Kaiser, William J. (University of California Los Angeles)
Many robotic path planning applications, such as search and rescue, involve uncertain environments with complex dynamics that can be only partially observed. When selecting the best subset of observation locations subject to constrained resources (such as limited time or battery capacity) it is an important problem to trade off exploration (gathering information about the environment) and exploitation (using the current knowledge about the environment most effectively) for efficiently observing these environments. Even the nonadaptive setting, where paths are planned before observations are made, is NP-hard, and has been subject to much research. In this paper, we present a novel approach to adaptive informative path planning that addresses this exploration-exploitation tradeoff. Our approach is nonmyopic, i.e. it plans ahead for possible observations that can be made in the future. We quantify the benefit of exploration through the “adaptivity gap” between an adaptive and a nonadaptive algorithm in terms of the uncertainty in the environment. Exploiting the submodularity (a diminishing returns property) and locality properties of the objective function, we develop an algorithm that performs provably near-optimally in settings where the adaptivity gap is small. In case of large gap, we use an objective function that simultaneously optimizes paths for exploration and exploitation. We also provide an algorithm to extend any single robot algorithm for adaptive informative path planning to the multi robot setting while approximately preserving the theoretical guarantee of the single robot algorithm. We extensively evaluate our approach on a search and rescue domain and a scientific monitoring problem using a real robotic system.
Greedy Algorithms for Sequential Sensing Decisions
Hajishirzi, Hannaneh (University of Illinois at Urbana-Champaign) | Shirazi, Afsaneh (University of Illinois at Urbana-Champaign) | Choi, Jaesik (University of Illinois at Urbana-Champaign) | Amir, Eyal (University of Illinois at Urbana-Champaign)
In many real-world situations we are charged with detecting change as soon as possible. Important examples include detecting medical conditions, detecting security breaches, and updating caches of distributed databases. In those situations, sensing can be expensive, but it is also important to detect change in a timely manner. In this paper we present tractable greedy algorithms and prove that they solve this decision problem either optimally or approximate the optimal solution in many cases. Our problem model is a POMDP that includes a cost for sensing, a cost for delayed detection, a reward for successful detection, and no-cost partial observations. Making optimal decisions is difficult in general. We show that our tractable greedy approach finds optimal policies for sensing both a single variable and multiple correlated variables. Further, we provide approximations for the optimal solution to multiple hidden or observed variables per step. Our algorithms outperform previous algorithms in experiments over simulated data and live Wikipedia WWW pages.
Speeding Up Inference in Markov Logic Networks by Preprocessing to Reduce the Size of the Resulting Grounded Network
Shavlik, Jude (University of Wisconsin Madison) | Natarajan, Sriraam (University of Wisconsin Madison)
Statistical-relational reasoning has received much attention due to its ability to robustly model complex relationships. A key challenge is tractable inference, especially in domains involving many objects, due to the combinatorics involved. One can accelerate inference by using approximation techniques, lazy algorithms, etc. We consider Markov Logic Networks (MLNs), which involve counting how often logical formulae are satisfied. We propose a preprocessing algorithm that can substantially reduce the effective size of MLNs by rapidly counting how often the evidence satisfies each formula, regardless of the truth values of the query literals. This is a general preprocessing method that loses no information and can be used for any MLN inference algorithm. We evaluate our algorithm empirically in three real-world domains, greatly reducing the work needed during subsequent inference. Such reduction might even allow exact inference to be performed when sampling methods would be otherwise necessary.