Uncertainty
Accelerated First-order Methods on the Wasserstein Space for Bayesian Inference
Liu, Chang, Zhuo, Jingwei, Cheng, Pengyu, Zhang, Ruiyi, Zhu, Jun, Carin, Lawrence
We consider doing Bayesian inference by minimizing the KL divergence on the 2-Wasserstein space $\mathcal{P}_2$. By exploring the Riemannian structure of $\mathcal{P}_2$, we develop two inference methods by simulating the gradient flow on $\mathcal{P}_2$ via updating particles, and an acceleration method that speeds up all such particle-simulation-based inference methods. Moreover we analyze the approximation flexibility of such methods, and conceive a novel bandwidth selection method for the kernel that they use. We note that $\mathcal{P}_2$ is quite abstract and general so that our methods can make closer approximation, while it still has a rich structure that enables practical implementation. Experiments show the effectiveness of the two proposed methods and the improvement of convergence by the acceleration method.
Learning models for visual 3D localization with implicit mapping
Rosenbaum, Dan, Besse, Frederic, Viola, Fabio, Rezende, Danilo J., Eslami, S. M. Ali
We propose a formulation of visual localization that does not require construction of explicit maps in the form of point clouds or voxels. The goal is to learn an implicit representation of the environment at a higher, more abstract level, for instance that of objects. To study this approach we consider procedurally generated Minecraft worlds, for which we can generate visually rich images along with camera pose coordinates. We first show that Generative Query Networks (GQNs) enhanced with a novel attention mechanism can capture the visual structure of 3D scenes in Minecraft, as evidenced by their samples. We then apply the models to the localization problem, investigating both generative and discriminative approaches, and compare the different ways in which they each capture task uncertainty. Our results show that models with implicit mapping are able to capture the underlying 3D structure of visually complex scenes, and use this to accurately localize new observations, paving the way towards future applications in sequential localization. Supplementary video available at https://youtu.be/iHEXX5wXbCI.
When Gaussian Process Meets Big Data: A Review of Scalable GPs
Liu, Haitao, Ong, Yew-Soon, Shen, Xiaobo, Cai, Jianfei
The vast quantity of information brought by big data as well as the evolving computer hardware encourages success stories in the machine learning community. In the meanwhile, it poses challenges for the Gaussian process (GP), a well-known non-parametric and interpretable Bayesian model, which suffers from cubic complexity to training size. To improve the scalability while retaining the desirable prediction quality, a variety of scalable GPs have been presented. But they have not yet been comprehensively reviewed and discussed in a unifying way in order to be well understood by both academia and industry. To this end, this paper devotes to reviewing state-of-the-art scalable GPs involving two main categories: global approximations which distillate the entire data and local approximations which divide the data for subspace learning. Particularly, for global approximations, we mainly focus on sparse approximations comprising prior approximations which modify the prior but perform exact inference, and posterior approximations which retain exact prior but perform approximate inference; for local approximations, we highlight the mixture/product of experts that conducts model averaging from multiple local experts to boost predictions. To present a complete review, recent advances for improving the scalability and model capability of scalable GPs are reviewed. Finally, the extensions and open issues regarding the implementation of scalable GPs in various scenarios are reviewed and discussed to inspire novel ideas for future research avenues.
Playing against Nature: causal discovery for decision making under uncertainty
Gonzalez-Soto, M., Sucar, L. E., Escalante, H. J.
We consider decision problems under uncertainty where the options available to a decision maker and the resulting outcome are related through a causal mechanism which is unknown to the decision maker. We ask how a decision maker can learn about this causal mechanism through sequential decision making as well as using current causal knowledge inside each round in order to make better choices had she not considered causal knowledge and propose a decision making procedure in which an agent holds \textit{beliefs} about her environment which are used to make a choice and are updated using the observed outcome. As proof of concept, we present an implementation of this causal decision making model and apply it in a simple scenario. We show that the model achieves a performance similar to the classic Q-learning while it also acquires a causal model of the environment.
Diagonal Discriminant Analysis with Feature Selection for High Dimensional Data
Romanes, Sarah Elizabeth, Ormerod, John Thomas, Yang, Jean YH
Classification problems involving high dimensional data are extensive in many fields such as finance, marketing, and bioinformatics. Unique challenges with high dimensional datasets are numerous and well known, with many classifiers built under traditional low dimensional frameworks simply unable to be applied to such high dimensional data. Discriminant Analysis (DA) is one such example (Fisher, 1936). DA classifiers work by assuming the distribution of the features is strictly Gaussian at the class level, and assign a particular point to the class label which minimises the Mahalanobis (for linear discriminant analysis (LDA)) distance between that point and the mean of the multivariate normal corresponding to such class. Although extraordinary simple and easy to use in low dimensional settings, DA is well known to be unusable in high dimensions due to the maximum likelihood estimate of the corresponding covariance matrix being singular when the number of features is greater than that of the observations.
Hypertree Decompositions Revisited for PGMs
Arun, Aarthy Shivram, Jayaraman, Sai Vikneshwar Mani, Ré, Christopher, Rudra, Atri
We revisit the classical problem of exact inference on probabilistic graphical models (PGMs). Our algorithm is based on recent \emph{worst-case optimal database join} algorithms, which can be asymptotically faster than traditional data processing methods. We present the first empirical evaluation of these algorithms via JoinInfer -- a new exact inference engine. We empirically explore the properties of the data for which our engine can be expected to outperform traditional inference engines, refining current theoretical notions. Further, JoinInfer outperforms existing state-of-the-art inference engines (ACE, IJGP and libDAI) on some standard benchmark datasets by up to a factor of 630x. Finally, we propose a promising data-driven heuristic that extends JoinInfer to automatically tailor its parameters and/or switch to the traditional inference algorithms.
Answering Hindsight Queries with Lifted Dynamic Junction Trees
Gehrke, Marcel, Braun, Tanya, Möller, Ralf
The lifted dynamic junction tree algorithm (LDJT) efficiently answers filtering and prediction queries for probabilistic relational temporal models by building and then reusing a first-order cluster representation of a knowledge base for multiple queries and time steps. We extend LDJT to (i) solve the smoothing inference problem to answer hindsight queries by introducing an efficient backward pass and (ii) discuss different options to instantiate a first-order cluster representation during a backward pass. Further, our relational forward backward algorithm makes hindsight queries to the very beginning feasible. LDJT answers multiple temporal queries faster than the static lifted junction tree algorithm on an unrolled model, which performs smoothing during message passing.
Preventing Unnecessary Groundings in the Lifted Dynamic Junction Tree Algorithm
Gehrke, Marcel, Braun, Tanya, Möller, Ralf
The lifted dynamic junction tree algorithm (LDJT) efficiently answers filtering and prediction queries for probabilistic relational temporal models by building and then reusing a first-order cluster representation of a knowledge base for multiple queries and time steps. Unfortunately, a non-ideal elimination order can lead to groundings even though a lifted run is possible for a model. We extend LDJT (i) to identify unnecessary groundings while proceeding in time and (ii) to prevent groundings by delaying eliminations through changes in a temporal first-order cluster representation. The extended version of LDJT answers multiple temporal queries orders of magnitude faster than the original version.
Logical Explanations for Deep Relational Machines Using Relevance Information
Srinivasan, Ashwin, Vig, Lovekesh, Bain, Michael
Our interest in this paper is in the construction of symbolic explanations for predictions made by a deep neural network. We will focus attention on deep relational machines (DRMs, first proposed by H. Lodhi). A DRM is a deep network in which the input layer consists of Boolean-valued functions (features) that are defined in terms of relations provided as domain, or background, knowledge. Our DRMs differ from those proposed by Lodhi, which use an Inductive Logic Programming (ILP) engine to first select features (we use random selections from a space of features that satisfies some approximate constraints on logical relevance and non-redundancy). But why do the DRMs predict what they do? One way of answering this is the LIME setting, in which readable proxies for a black-box predictor. The proxies are intended only to model the predictions of the black-box in local regions of the instance-space. But readability alone may not enough: to be understandable, the local models must use relevant concepts in an meaningful manner. We investigate the use of a Bayes-like approach to identify logical proxies for local predictions of a DRM. We show: (a) DRM's with our randomised propositionalization method achieve state-of-the-art predictive performance; (b) Models in first-order logic can approximate the DRM's prediction closely in a small local region; and (c) Expert-provided relevance information can play the role of a prior to distinguish between logical explanations that perform equivalently on prediction alone.
Fusing First-order Knowledge Compilation and the Lifted Junction Tree Algorithm
Standard approaches for inference in probabilistic formalisms with first-order constructs include lifted variable elimination (LVE) for single queries as well as first-order knowledge compilation (FOKC) based on weighted model counting. To handle multiple queries efficiently, the lifted junction tree algorithm (LJT) uses a first-order cluster representation of a model and LVE as a subroutine in its computations. For certain inputs, the implementations of LVE and, as a result, LJT ground parts of a model where FOKC has a lifted run. The purpose of this paper is to prepare LJT as a backbone for lifted inference and to use any exact inference algorithm as subroutine. Using FOKC in LJT allows us to compute answers faster than LJT, LVE, and FOKC for certain inputs. AI areas such as natural language understanding and machine learning need efficient inference algorithms.