Undirected Networks
Deterministic versus Probabilistic Methods for Searching for an Evasive Target
Bernardini, Sara (Royal Holloway University of London) | Fox, Maria (King's College London) | Long, Derek (King's College London) | Piacentini, Chiara (University of Toronto)
Several advanced applications of autonomous aerial vehicles in civilian and military contexts involve a searching agent with imperfect sensors that seeks to locate a mobile target in a given region. Effectively managing uncertainty is key to solving the related search problem, which is why all methods devised so far hinge on a probabilistic formulation of the problem and solve it through branch-and-bound algorithms, Bayesian filtering or POMDP solvers. In this paper, we consider a class of hard search tasks involving a target that exhibits an intentional evasive behaviour and moves over a large geographical area, i.e., a target that is particularly difficult to track down and uncertain to locate. We show that, even for such a complex problem, it is advantageous to compile its probabilistic structure into a deterministic model and use standard deterministic solvers to find solutions. In particular, we formulate the search problem for our uncooperative target both as a deterministic automated planning task and as a constraint programming task and show that in both cases our solution outperforms POMDPs methods.
Accelerated Vector Pruning for Optimal POMDP Solvers
Walraven, Erwin (Delft University of Technology) | Spaan, Matthijs T. J. (Delft University of Technology)
Partially Observable Markov Decision Processes (POMDPs) are powerful models for planning under uncertainty in partially observable domains. However, computing optimal solutions for POMDPs is challenging because of the high computational requirements of POMDP solution algorithms. Several algorithms use a subroutine to prune dominated vectors in value functions, which requires a large number of linear programs (LPs) to be solved and it represents a large part of the total running time. In this paper we show how the LPs in POMDP pruning subroutines can be decomposed using a Benders decomposition. The resulting algorithm incrementally adds LP constraints and uses only a small fraction of the constraints. Our algorithm significantly improves the performance of existing pruning methods and the commonly used incremental pruning algorithm. Our new variant of incremental pruning is the fastest optimal pruning-based POMDP algorithm.
Optimizing Quantiles in Preference-Based Markov Decision Processes
Gilbert, Hugo (Pierre and Marie Curie University) | Weng, Paul (Sun Yat-sen University) | Xu, Yan (Carnegie Mellon University)
In the Markov decision process model, policies are usually evaluated by expected cumulative rewards. As this decision criterion is not always suitable, we propose in this paper an algorithm for computing a policy optimal for the quantile criterion. Both finite and infinite horizons are considered. Finally we experimentally evaluate our approach on random MDPs and on a data center control problem.
Bounding the Probability of Resource Constraint Violations in Multi-Agent MDPs
Nijs, Frits de (Delft University of Technology) | Walraven, Erwin (Delft University of Technology) | Weerdt, Mathijs M. de (Delft University of Technology) | Spaan, Matthijs T. J. (Delft University of Technology)
Multi-agent planning problems with constraints on global resource consumption occur in several domains. Existing algorithms for solving Multi-agent Markov Decision Processes can compute policies that meet a resource constraint in expectation, but these policies provide no guarantees on the probability that a resource constraint violation will occur. We derive a method to bound constraint violation probabilities using Hoeffding's inequality. This method is applied to two existing approaches for computing policies satisfying constraints: the Constrained MDP framework and a Column Generation approach. We also introduce an algorithm to adaptively relax the bound up to a given maximum violation tolerance. Experiments on a hard toy problem show that the resulting policies outperform static optimal resource allocations to an arbitrary level. By testing the algorithms on more realistic planning domains from the literature, we demonstrate that the adaptive bound is able to efficiently trade off violation probability with expected value, outperforming state-of-the-art planners.
Greedy Flipping for Constrained Word Deletion
Yao, Jin-ge (Peking University) | Wan, Xiaojun (Peking University)
In this paper we propose a simple yet efficient method for constrained word deletion to compress sentences, based on top-down greedy local flipping from multiple random initializations. The algorithm naturally integrates various grammatical constraints in the compression process, without using time-consuming integer linear programming solvers. Our formulation suits for any objective function involving arbitrary local score definition. Experimental results show that the proposed method achieves nearly identical performance with explicit ILP formulation while being much more efficient.
A Hierarchical Latent Variable Encoder-Decoder Model for Generating Dialogues
Serban, Iulian Vlad (University of Montreal) | Sordoni, Alessandro (Maluuba Inc) | Lowe, Ryan (McGill University) | Charlin, Laurent (HEC Montréal) | Pineau, Joelle (McGill University ) | Courville, Aaron (University of Montreal) | Bengio, Yoshua (University of Montreal)
Sequential data often possesses hierarchical structures with complex dependencies between sub-sequences, such as found between the utterances in a dialogue. To model these dependencies in a generative framework, we propose a neural network-based generative architecture, with stochastic latent variables that span a variable number of time steps. We apply the proposed model to the task of dialogue response generation and compare it with other recent neural-network architectures. We evaluate the model performance through a human evaluation study. The experiments demonstrate that our model improves upon recently proposed models and that the latent variables facilitate both the generation of meaningful, long and diverse responses and maintaining dialogue state.
Maximum Reconstruction Estimation for Generative Latent-Variable Models
Cheng, Yong (Tsinghua University) | Liu, Yang (Tsinghua University) | Xu, Wei (Tsinghua University)
Generative latent-variable models are important for natural language processing due to their capability of providing compact representations of data. As conventional maximum likelihood estimation (MLE) is prone to focus on explaining irrelevant but common correlations in data, we apply maximum reconstruction estimation (MRE) to learning generative latent-variable models alternatively, which aims to find model parameters that maximize the probability of reconstructing the observed data. We develop tractable algorithms to directly learn hidden Markov models and IBM translation models using the MRE criterion, without the need to introduce a separate reconstruction model to facilitate efficient inference. Experiments on unsupervised part-of-speech induction and unsupervised word alignment show that our approach enables generative latent-variable models to better discover intended correlations in data and outperforms maximum likelihood estimators significantly.
Collective Multiagent Sequential Decision Making Under Uncertainty
Nguyen, Duc Thien (Singapore Management University) | Kumar, Akshat (Singapore Management University) | Lau, Hoong Chuin (Singapore Management University)
Multiagent sequential decision making has seen rapid progress with formal models such as decentralized MDPs and POMDPs. However, scalability to large multiagent systems and applicability to real world problems remain limited. To address these challenges, we study multiagent planning problems where the collective behavior of a population of agents affects the joint-reward and environment dynamics. Our work exploits recent advances in graphical models for modeling and inference with a population of individuals such as collective graphical models and the notion of finite partial exchangeability in lifted inference. We develop a collective decentralized MDP model where policies can be computed based on counts of agents in different states. As the policy search space over counts is combinatorial, we develop a sampling based framework that can compute open and closed loop policies. Comparisons with previous best approaches on synthetic instances and a real world taxi dataset modeling supply-demand matching show that our approach significantly outperforms them w.r.t.solution quality.
Column Networks for Collective Classification
Pham, Trang (Deakin University) | Tran, Truyen (Deakin University) | Phung, Dinh (Deakin University) | Venkatesh, Svetha (Deakin University)
Relational learning deals with data that are characterized by relational structures. An important task is collective classification, which is to jointly classify networked objects. While it holds a great promise to produce a better accuracy than non-collective classifiers, collective classification is computationally challenging and has not leveraged on the recent breakthroughs of deep learning. We present Column Network (CLN), a novel deep learning model for collective classification in multi-relational domains. CLN has many desirable theoretical properties: (i) it encodes multi-relations between any two instances; (ii) it is deep and compact, allowing complex functions to be approximated at the network level with a small set of free parameters; (iii) local and relational features are learned simultaneously; (iv) long-range, higher-order dependencies between instances are supported naturally; and (v) crucially, learning and inference are efficient with linear complexity in the size of the network and the number of relations. We evaluate CLN on multiple real-world applications: (a) delay prediction in software projects, (b) PubMed Diabetes publication classification and (c) film genre classification. In all of these applications, CLN demonstrates a higher accuracy than state-of-the-art rivals.
Sampling Beats Fixed Estimate Predictors for Cloning Stochastic Behavior in Multiagent Systems
Hrolenok, Brian (Georgia Institute of Technology) | Boots, Byron (Georgia Institute of Technology) | Balch, Tucker Hybinette (Georgia Institute of Technology)
Modeling stochastic multiagent behavior such as fish schooling is challenging for fixed-estimate prediction techniques because they fail to reliably reproduce the stochastic aspects of the agents’ behavior. We show how standard fixed-estimate predictors fit within a probabilistic framework, and suggest the reason they work for certain classes of behaviors and not others. We quantify the degree of mismatch and offer alternative sampling-based modeling techniques. We are specifically interested in building executable models (as opposed to statistical or descriptive models) because we want to reproduce and study multiagent behavior in simulation. Such models can be used by biologists, sociologists, and economists to explain and predict individual and group behavior in novel scenarios, and to test hypotheses regarding group behavior. Developing models from observation of real systems is an obvious application of machine learning. Learning directly from data eliminates expensive hand processing and tuning, but introduces unique challenges that violate certain assumptions common in standard machine learning approaches. Our framework suggests a new class of sampling-based methods, which we implement and apply to simulated deterministic and stochastic schooling behaviors, as well as the observed schooling behavior of real fish. Experimental results show that our implementation performs comparably with standard learning techniques for deterministic behaviors, and better on stochastic behaviors.