Goto

Collaborating Authors

 Bayesian Inference


Bayesian Mean-parameterized Nonnegative Binary Matrix Factorization

arXiv.org Machine Learning

Binary data matrices can represent many types of data such as social networks, votes or gene expression. In some cases, the analysis of binary matrices can be tackled with nonnegative matrix factorization (NMF), where the observed data matrix is approximated by the product of two smaller nonnegative matrices. In this context, probabilistic NMF assumes a generative model where the data is usually Bernoulli-distributed. Often, a link function is used to map the factorization to the $[0,1]$ range, ensuring a valid Bernoulli mean parameter. However, link functions have the potential disadvantage to lead to uninterpretable models. Mean-parameterized NMF, on the contrary, overcomes this problem. We propose a unified framework for Bayesian mean-parameterized nonnegative binary matrix factorization models (NBMF). We analyze three models which correspond to three possible constraints that respect the mean-parametrization without the need for link functions. Furthermore, we derive a novel collapsed Gibbs sampler and a collapsed variational algorithm to infer the posterior distribution of the factors. Next, we extend the proposed models to a nonparametric setting where the number of used latent dimensions is automatically driven by the observed data. We analyze the performance of our NBMF methods in multiple datasets for different tasks such as dictionary learning and prediction of missing data. Experiments show that our methods provide similar or superior results than the state of the art, while automatically detecting the number of relevant components.


An Active Information Seeking Model for Goal-oriented Vision-and-Language Tasks

arXiv.org Machine Learning

As Computer Vision algorithms move from passive analysis of pixels to active reasoning over semantics, the breadth of information algorithms need to reason over has expanded significantly. One of the key challenges in this vein is the ability to identify the information required to make a decision, and select an action that will recover this information. We propose an reinforcement-learning approach that maintains an distribution over its internal information, thus explicitly representing the ambiguity in what it knows, and needs to know, towards achieving its goal. Potential actions are then generated according to particles sampled from this distribution. For each potential action a distribution of the expected answers is calculated, and the value of the information gained is obtained, as compared to the existing internal information. We demonstrate this approach applied to two vision-language problems that have attracted significant recent interest, visual dialogue and visual query generation. In both cases the method actively selects actions that will best reduce its internal uncertainty, and outperforms its competitors in achieving the goal of the challenge.


What's to know? Uncertainty as a Guide to Asking Goal-oriented Questions

arXiv.org Artificial Intelligence

One of the core challenges in Visual Dialogue problems is asking the question that will provide the most useful information towards achieving the required objective. Encouraging an agent to ask the right questions is difficult because we don't know a-priori what information the agent will need to achieve its task, and we don't have an explicit model of what it knows already. We propose a solution to this problem based on a Bayesian model of the uncertainty in the implicit model maintained by the visual dialogue agent, and in the function used to select an appropriate output. By selecting the question that minimises the predicted regret with respect to this implicit model the agent actively reduces ambiguity. The Bayesian model of uncertainty also enables a principled method for identifying when enough information has been acquired, and an action should be selected. We evaluate our approach on two goal-oriented dialogue datasets, one for visual-based collaboration task and the other for a negotiation-based task. Our uncertainty-aware information-seeking model outperforms its counterparts in these two challenging problems.


Bayesian deep neural networks for low-cost neurophysiological markers of Alzheimer's disease severity

arXiv.org Machine Learning

As societies around the world are ageing, the number of Alzheimer's disease (AD) patients is rapidly increasing. To date, no low-cost, non-invasive biomarkers have been established to advance the objectivization of AD diagnosis and progression assessment. Here, we utilize Bayesian neural networks to develop a multivariate predictor for AD severity using a wide range of quantitative EEG (QEEG) markers. The Bayesian treatment of neural networks both automatically controls model complexity and provides a predictive distribution over the target function, giving uncertainty bounds for our regression task. It is therefore well suited to clinical neuroscience, where data sets are typically sparse and practitioners require a precise assessment of the predictive uncertainty. We use data of one of the largest prospective AD EEG trials ever conducted to demonstrate the potential of Bayesian deep learning in this domain, while comparing two distinct Bayesian neural network approaches, i.e., Monte Carlo dropout and Hamiltonian Monte Carlo.


Doubly Bayesian Optimization

arXiv.org Artificial Intelligence

Bayesian optimization (BO) is a powerful method for optimizing complex black-box functions that are costly to evaluate directly. Although useful out of the box, complexities arise when the domain exhibits non-smooth structure, noise, or greater than five dimensions. Extending BO for these issues is non-trivial, which is why we suggest casting BO methods into the probabilistic programming paradigm. These systems (PPS) enable users to encode model structure and naturally reason about uncertainties, which can be leveraged towards improved BO methods. Here we present a probabilistic domain-specific language where BO is native, showing the Bayesian approach to optimization is more naturally expressed in a PPS, and better equipped to address the above issues. We validate the approach on standard optimization benchmarks, while demonstrating the utility of programmable structure to address the inner-optimization problem of BO. Importantly, we also show that the framework enables the user to more readily use advanced techniques such as unscented BO and noisy expected improvement.


Recent Advances in Autoencoder-Based Representation Learning

arXiv.org Machine Learning

Learning useful representations with little or no supervision is a key challenge in artificial intelligence. We provide an in-depth review of recent advances in representation learning with a focus on autoencoder-based models. To organize these results we make use of meta-priors believed useful for downstream tasks, such as disentanglement and hierarchical organization of features. In particular, we uncover three main mechanisms to enforce such properties, namely (i) regularizing the (approximate or aggregate) posterior distribution, (ii) factorizing the encoding and decoding distribution, or (iii) introducing a structured prior distribution. While there are some promising results, implicit or explicit supervision remains a key enabler and all current methods use strong inductive biases and modeling assumptions. Finally, we provide an analysis of autoencoder-based representation learning through the lens of rate-distortion theory and identify a clear tradeoff between the amount of prior knowledge available about the downstream tasks, and how useful the representation is for this task.


Local Probabilistic Model for Bayesian Classification: a Generalized Local Classification Model

arXiv.org Machine Learning

In Bayesian classification, it is important to establish a probabilistic model for each class for likelihood estimation. Most of the previous methods modeled the probability distribution in the whole sample space. However, real-world problems are usually too complex to model in the whole sample space; some fundamental assumptions are required to simplify the global model, for example, the class conditional independence assumption for naive Bayesian classification. In this paper, with the insight that the distribution in a local sample space should be simpler than that in the whole sample space, a local probabilistic model established for a local region is expected much simpler and can relax the fundamental assumptions that may not be true in the whole sample space. Based on these advantages we propose establishing local probabilistic models for Bayesian classification. In addition, a Bayesian classifier adopting a local probabilistic model can even be viewed as a generalized local classification model; by tuning the size of the local region and the corresponding local model assumption, a fitting model can be established for a particular classification problem. The experimental results on several real-world datasets demonstrate the effectiveness of local probabilistic models for Bayesian classification.


Surrogate-assisted Bayesian inversion for landscape and basin evolution models

arXiv.org Machine Learning

The complex and computationally expensive features of the forward landscape and sedimentary basin evolution models pose a major challenge in the development of efficient inference and optimization methods. Bayesian inference provides a methodology for estimation and uncertainty quantification of free model parameters. In our previous work, parallel tempering Bayeslands was developed as a framework for parameter estimation and uncertainty quantification for the landscape and basin evolution modelling software Badlands. Parallel tempering Bayeslands features high-performance computing with dozens of processing cores running in parallel to enhance computational efficiency. Although parallel computing is used, the procedure remains computationally challenging since thousands of samples need to be drawn and evaluated. In large-scale landscape and basin evolution problems, a single model evaluation can take from several minutes to hours, and in certain cases, even days. Surrogate-assisted optimization has been with successfully applied to a number of engineering problems. This motivates its use in optimisation and inference methods suited for complex models in geology and geophysics. Surrogates can speed up parallel tempering Bayeslands by developing computationally inexpensive surrogates to mimic expensive models. In this paper, we present an application of surrogate-assisted parallel tempering where that surrogate mimics a landscape evolution model including erosion, sediment transport and deposition, by estimating the likelihood function that is given by the model. We employ a machine learning model as a surrogate that learns from the samples generated by the parallel tempering algorithm. The results show that the methodology is effective in lowering the overall computational cost significantly while retaining the quality of solutions.


Encoding prior knowledge in the structure of the likelihood

arXiv.org Machine Learning

The inference of deep hierarchical models is problematic due to strong dependencies between the hierarchies. We investigate a specific transformation of the model parameters based on the multivariate distributional transform. This transformation is a special form of the reparametrization trick, flattens the hierarchy and leads to a standard Gaussian prior on all resulting parameters. The transformation also transfers all the prior information into the structure of the likelihood, hereby decoupling the transformed parameters a priori from each other. A variational Gaussian approximation in this standardized space will be excellent in situations of relatively uninformative data. Additionally, the curvature of the log-posterior is well-conditioned in directions that are weakly constrained by the data, allowing for fast inference in such a scenario. In an example we perform the transformation explicitly for Gaussian process regression with a priori unknown correlation structure. Deep models are inferred rapidly in highly and slowly in poorly informed situations. The flat model show exactly the opposite performance pattern. A synthesis of both, the deep and the flat perspective, provides their combined advantages and overcomes the individual limitations, leading to a faster inference.


Spiking Neural Networks: A Stochastic Signal Processing Perspective

arXiv.org Machine Learning

Spiking Neural Networks (SNNs) are distributed systems whose computing elements, or neurons, are characterized by analog internal dynamics and by digital and sparse inter-neuron, or synaptic, communications. The sparsity of the synaptic spiking inputs and the corresponding event-driven nature of neural processing can be leveraged by hardware implementations to obtain significant energy reductions as compared to conventional Artificial Neural Networks (ANNs). SNNs can be used not only as coprocessors tocarry out given computing tasks, such as classification, but also as learning machines that adapt their internal parameters, e.g., their synaptic weights, on the basis of data and of a learning criterion. This paper provides an overview of models, learning rules, and applications of SNNs from the viewpoint of stochastic signal processing. INTRODUCTION Artificial Neural Networks (ANNs) have become the de-facto standard tool to carry out supervised, unsupervised, and reinforcement learning tasks. Their recent successes range from image classifiers that outperform human experts in medical diagnosis to machines that defeat professional players at complex games such as Go.