Goto

Collaborating Authors

 Bayesian Inference


Digital Neural Networks in the Brain: From Mechanisms for Extracting Structure in the World To Self-Structuring the Brain Itself

arXiv.org Artificial Intelligence

In order to keep trace of information, the brain has to resolve the problem where information is and how to index new ones. We propose that the neural mechanism used by the prefrontal cortex (PFC) to detect structure in temporal sequences, based on the temporal order of incoming information, has served as second purpose to the spatial ordering and indexing of brain networks. We call this process, apparent to the manipulation of neural 'addresses' to organize the brain's own network, the 'digitalization' of information. Such tool is important for information processing and preservation, but also for memory formation and retrieval.


Model Evidence with Fast Tree Based Quadrature

arXiv.org Machine Learning

High dimensional integration is essential to many areas of science, ranging from particle physics to Bayesian inference. Approximating these integrals is hard, due in part to the difficulty of locating and sampling from regions of the integration domain that make significant contributions to the overall integral. Here, we present a new algorithm called Tree Quadrature (TQ) that separates this sampling problem from the problem of using those samples to produce an approximation of the integral. TQ places no qualifications on how the samples provided to it are obtained, allowing it to use state-of-the-art sampling algorithms that are largely ignored by existing integration algorithms. Given a set of samples, TQ constructs a surrogate model of the integrand in the form of a regression tree, with a structure optimised to maximise integral precision. The tree divides the integration domain into smaller containers, which are individually integrated and aggregated to estimate the overall integral. Any method can be used to integrate each individual container, so existing integration methods, like Bayesian Monte Carlo, can be combined with TQ to boost their performance. On a set of benchmark problems, we show that TQ provides accurate approximations to integrals in up to 15 dimensions; and in dimensions 4 and above, it outperforms simple Monte Carlo and the popular Vegas method (Lepage, 1978).


Global Optimization of Gaussian processes

arXiv.org Machine Learning

Gaussian processes~(Kriging) are interpolating data-driven models that are frequently applied in various disciplines. Often, Gaussian processes are trained on datasets and are subsequently embedded as surrogate models in optimization problems. These optimization problems are nonconvex and global optimization is desired. However, previous literature observed computational burdens limiting deterministic global optimization to Gaussian processes trained on few data points. We propose a reduced-space formulation for deterministic global optimization with trained Gaussian processes embedded. For optimization, the branch-and-bound solver branches only on the degrees of freedom and McCormick relaxations are propagated through explicit Gaussian process models. The approach also leads to significantly smaller and computationally cheaper subproblems for lower and upper bounding. To further accelerate convergence, we derive envelopes of common covariance functions for GPs and tight relaxations of acquisition functions used in Bayesian optimization including expected improvement, probability of improvement, and lower confidence bound. In total, we reduce computational time by orders of magnitude compared to state-of-the-art methods, thus overcoming previous computational burdens. We demonstrate the performance and scaling of the proposed method and apply it to Bayesian optimization with global optimization of the acquisition function and chance-constrained programming. The Gaussian process models, acquisition functions, and training scripts are available open-source within the "MeLOn - Machine Learning Models for Optimization" toolbox~(https://git.rwth-aachen.de/avt.svt/public/MeLOn).


Information Acquisition Under Resource Limitations in a Noisy Environment

arXiv.org Artificial Intelligence

We introduce a theoretical model of information acquisition under resource limitations in a noisy environment. An agent must guess the truth value of a given Boolean formula $\varphi$ after performing a bounded number of noisy tests of the truth values of variables in the formula. We observe that, in general, the problem of finding an optimal testing strategy for $\phi$ is hard, but we suggest a useful heuristic. The techniques we use also give insight into two apparently unrelated, but well-studied problems: (1) \emph{rational inattention}, that is, when it is rational to ignore pertinent information (the optimal strategy may involve hardly ever testing variables that are clearly relevant to $\phi$), and (2) what makes a formula hard to learn/remember.


Effective Learning of a GMRF Mixture Model

arXiv.org Machine Learning

Learning a Gaussian Mixture Model (GMM) is hard when the number of parameters is too large given the amount of available data. As a remedy, we propose restricting the GMM to a Gaussian Markov Random Field Mixture Model (GMRF-MM), as well as a new method for estimating the latter's sparse precision (i.e., inverse covariance) matrices. When the sparsity pattern of each matrix is known, we propose an efficient optimization method for the Maximum Likelihood Estimate (MLE) of that matrix. When it is unknown, we utilize the popular Graphical LASSO (GLASSO) to estimate that pattern. However, we show that even for a single Gaussian, when GLASSO is tuned to successfully estimate the sparsity pattern, it does so at the price of a substantial bias of the values of the nonzero entries of the matrix, and we show that this problem only worsens in a mixture setting. To overcome this, we discard the non-zero values estimated by GLASSO, keep only its pattern estimate and use it within the proposed MLE method. This yields an effective two-step procedure that removes the bias. We show that our "debiasing" approach outperforms GLASSO in both the single-GMRF and the GMRF-MM cases. We also show that when learning priors for image patches, our method outperforms GLASSO even if we merely use an educated guess about the sparsity pattern, and that our GMRF-MM outperforms the baseline GMM on real and synthetic high-dimensional datasets. Our code is available at \url{https://github.com/shahaffind/GMRF-MM}.


Heuristic AND/OR Search for Solving Influence Diagram

AAAI Conferences

An influence diagram is a graphical representation of sequential decision-making under uncertainty, defining a structured decision problem by conditional probability functions and additive utility functions over discrete state and action variables. The task of finding the maximum expected utility of influence diagrams is closely related to the cost-optimal probabilistic planning, stochastic programmings, or model-based reinforcement learning. In this position paper, we address the heuristic search for solving influence diagram, where we generate admissible heuristic functions from graph decomposition schemes. Then, we demonstrate how such heuristics can guide an AND/OR branch and bound search. Finally, we briefly discuss the future directions for improving the quality of heuristic functions and search strategies.


Applications of Probabilistic Programming (Master's thesis, 2015)

arXiv.org Artificial Intelligence

This thesis describes work on two applications of probabilistic programming: the learning of probabilistic program code given specifications, in particular program code of one-dimensional samplers; and the facilitation of sequential Monte Carlo inference with help of data-driven proposals. The latter is presented with experimental results on a linear Gaussian model and a non-parametric dependent Dirichlet process mixture of objects model for object recognition and tracking. In Chapter 1 we provide a brief introduction to probabilistic programming. In Chapter 2 we present an approach to automatic discovery of samplers in the form of probabilistic programs. We formulate a Bayesian approach to this problem by specifying a grammar-based prior over probabilistic program code. We use an approximate Bayesian computation method to learn the programs, whose executions generate samples that statistically match observed data or analytical characteristics of distributions of interest. In our experiments we leverage different probabilistic programming systems to perform Markov chain Monte Carlo sampling over the space of programs. Experimental results have demonstrated that, using the proposed methodology, we can learn approximate and even some exact samplers. Finally, we show that our results are competitive with regard to genetic programming methods. In Chapter 3, we describe a way to facilitate sequential Monte Carlo inference in probabilistic programming using data-driven proposals. In particular, we develop a distance-based proposal for the non-parametric dependent Dirichlet process mixture of objects model. We implement this approach in the probabilistic programming system Anglican, and show that for that model data-driven proposals provide significant performance improvements. We also explore the possibility of using neural networks to improve data-driven proposals.


Sparse Methods for Automatic Relevance Determination

arXiv.org Machine Learning

This work considers methods for imposing sparsity in Bayesian regression with applications in nonlinear system identification. We first review automatic relevance determination (ARD) and analytically demonstrate the need to additional regularization or thresholding to achieve sparse models. We then discuss two classes of methods, regularization based and thresholding based, which build on ARD to learn parsimonious solutions to linear problems. In the case of orthogonal covariates, we analytically demonstrate favorable performance with regards to learning a small set of active terms in a linear system with a sparse solution. Several example problems are presented to compare the set of proposed methods in terms of advantages and limitations to ARD in bases with hundreds of elements. The aim of this paper is to analyze and understand the assumptions that lead to several algorithms and to provide theoretical and empirical results so that the reader may gain insight and make more informed choices regarding sparse Bayesian regression.


An Analysis of the Adaptation Speed of Causal Models

arXiv.org Machine Learning

We consider the problem of discovering the causal process that generated a collection of datasets. We assume that all these datasets were generated by unknown sparse interventions on a structural causal model (SCM) $G$, that we want to identify. Recently, Bengio et al. (2020) argued that among all SCMs, $G$ is the fastest to adapt from one dataset to another, and proposed a meta-learning criterion to identify the causal direction in a two-variable SCM. While the experiments were promising, the theoretical justification was incomplete. Our contribution is a theoretical investigation of the adaptation speed of simple two-variable SCMs. We use convergence rates from stochastic optimization to justify that a relevant proxy for adaptation speed is distance in parameter space after intervention. Using this proxy, we show that the SCM with the correct causal direction is advantaged for categorical and normal cause-effect datasets when the intervention is on the cause variable. When the intervention is on the effect variable, we provide a more nuanced picture which highlights that the fastest-to-adapt heuristic is not always valid. Code to reproduce experiments is available at https://github.com/remilepriol/causal-adaptation-speed


Improving the Effectiveness of Traceability Link Recovery using Hierarchical Bayesian Networks

arXiv.org Artificial Intelligence

Traceability is a fundamental component of the modern software development process that helps to ensure properly functioning, secure programs. Due to the high cost of manually establishing trace links, researchers have developed automated approaches that draw relationships between pairs of textual software artifacts using similarity measures. However, the effectiveness of such techniques are often limited as they only utilize a single measure of artifact similarity and cannot simultaneously model (implicit and explicit) relationships across groups of diverse development artifacts. In this paper, we illustrate how these limitations can be overcome through the use of a tailored probabilistic model. To this end, we design and implement a HierarchiCal PrObabilistic Model for SoftwarE Traceability (Comet) that is able to infer candidate trace links. Comet is capable of modeling relationships between artifacts by combining the complementary observational prowess of multiple measures of textual similarity. Additionally, our model can holistically incorporate information from a diverse set of sources, including developer feedback and transitive (often implicit) relationships among groups of software artifacts, to improve inference accuracy. We conduct a comprehensive empirical evaluation of Comet that illustrates an improvement over a set of optimally configured baselines of $\approx$14% in the best case and $\approx$5% across all subjects in terms of average precision. The comparative effectiveness of Comet in practice, where optimal configuration is typically not possible, is likely to be higher. Finally, we illustrate Comets potential for practical applicability in a survey with developers from Cisco Systems who used a prototype Comet Jenkins plugin.