Goto

Collaborating Authors

 Bayesian Inference


How to Improve Political Forecasts - Issue 70: Variables

Nautilus

The 2020 Democratic candidates are out of the gate and the pollsters have the call! Bernie Sanders is leading by two lengths with Kamala Harris and Elizabeth Warren right behind, but Cory Booker and Beto O'Rourke are coming on fast! The political horse-race season is upon us and I bet I know what you are thinking: "Stop!" Every election we complain about horse-race coverage and every election we stay glued to it all the same. The problem with this kind of coverage is not that it's unimportant.


Scalable Data Augmentation for Deep Learning

arXiv.org Machine Learning

Scalable Data Augmentation (SDA) provides a framework for training deep neural networks (DNNs). Our methodology exploits auxiliary hidden units which are designed to avoid backtracking and traverse local modes in an efficient way. This allows us to exploit recent advantages in high performance computing such as scalable linear algebra (CUDA, XLA). We show how to implement standard activation and objective functions, including ReLU (Polson and Roฤkovรก, 2018), logit (Zhou et al., 2012) and SVM (Mallick et al., 2005) are all available as data augmentation schemes. Data augmentation strategies are commonplace in statistical applications such as EM, ECM and MM algorithms, as they accelerate convergence and can use Nesterov acceleration (Nesterov, 1983).


Variational Bayesian modelling of mixed-effects

arXiv.org Machine Learning

This note is concerned with an accurate and computationally efficient variational bayesian treatment of mixed-effects modelling. We focus on group studies, i.e. empirical studies that report multiple measurements acquired in multiple subjects. When approached from a bayesian perspective, such mixed-effects models typically rely upon a hierarchical generative model of the data, whereby both within- and between-subject effects contribute to the overall observed variance. The ensuing VB scheme can be used to assess statistical significance at the group level and/or to capture inter-individual differences. Alternatively, it can be seen as an adaptive regularization procedure, which iteratively learns the corresponding within-subject priors from estimates of the group distribution of effects of interest (cf. so-called "empirical bayes" approaches). We outline the mathematical derivation of the ensuing VB scheme, whose open-source implementation is available as part the VBA toolbox.


Transferability of Operational Status Classification Models Among Different Wind Turbine Typesq

arXiv.org Machine Learning

A detailed understanding of wind turbine performance status classification can improve operations and maintenance in the wind energy industry. Due to different engineering properties of wind turbines, the standard supervised learning models used for classification do not generalize across data sets obtained from different wind sites. We propose two methods to deal with the transferability of the trained models: first, data normalization in the form of power curve alignment, and second, a robust method based on convolutional neural networks and feature-space extension. We demonstrate the success of our methods on real-world data sets with industrial applications. Keywords: Machine learning, classification, generalization, CNN, wind turbine, wind energy 1. Introduction Classification of operational status is an important step for performance analysis of wind farms from data of SCADA (Supervisory Control and Data Acquisition) type.


The Binary Space Partitioning-Tree Process

arXiv.org Artificial Intelligence

The Mondrian process represents an elegant and powerful approach for space partition modelling. However, as it restricts the partitions to be axis-aligned, its modelling flexibility is limited. In this work, we propose a self-consistent Binary Space Partitioning (BSP)-Tree process to generalize the Mondrian process. The BSP-Tree process is an almost surely right continuous Markov jump process that allows uniformly distributed oblique cuts in a two-dimensional convex polygon. The BSP-Tree process can also be extended using a non-uniform probability measure to generate direction differentiated cuts. The process is also self-consistent, maintaining distributional invariance under a restricted subdomain. We use Conditional-Sequential Monte Carlo for inference using the tree structure as the high-dimensional variable. The BSP-Tree process's performance on synthetic data partitioning and relational modelling demonstrates clear inferential improvements over the standard Mondrian process and other related methods.


TATi-Thermodynamic Analytics ToolkIt: TensorFlow-based software for posterior sampling in machine learning applications

arXiv.org Machine Learning

The fundamental role of neural networks (NNs) is readily apparent from their widespread use in machine learning in applications such as natural language processing [72], social network analysis [26], medical diagnosis [6, 35], vision systems [66], and robotic path planning [44]. The greatest success of these models lies in their flexibility, their ability to represent complex, nonlinear relationships in high-dimensional data sets, and the availability of frameworks that allow NNs to be implemented on rapidly evolving GPU platforms [40, 29]. The industrial appetite for deep learning has led to very rapid expansion of the subject in recent years, although, as pointed out by Dunson [19], at times the mathematical and theoretical understanding of these methods has been swept aside in the rush to advance the methodology. The potential impact on society of machine learning algorithms demands that their exposition and use be subject to the highest standards of clarity, ease of interpretation, and uncertainty quantification. Typical NN training seeks to optimize the parameters of the network (biases and weights) under the constraint that the training data set is well approximated [28, 23].


Combining Model and Parameter Uncertainty in Bayesian Neural Networks

arXiv.org Machine Learning

Bayesian neural networks (BNNs) have recently regained a significant amount of attention in the deep learning community due to the development of scalable approximate Bayesian inference techniques. There are several advantages of using Bayesian approach: Parameter and prediction uncertainty become easily available, facilitating rigid statistical analysis. Furthermore, prior knowledge can be incorporated. However so far there have been no scalable techniques capable of combining both model (structural) and parameter uncertainty. In this paper we introduce the concept of model uncertainty in BNNs and hence make inference in the joint space of models and parameters. Moreover, we suggest an adaptation of a scalable variational inference approach with reparametrization of marginal inclusion probabilities to incorporate the model space constraints. Finally, we show that incorporating model uncertainty via Bayesian model averaging and Bayesian model selection allows to drastically sparsify the structure of BNNs without significant loss of predictive power.


High-Dimensional Bernoulli Autoregressive Process with Long-Range Dependence

arXiv.org Machine Learning

We consider the problem of estimating the parameters of a multivariate Bernoulli process with auto-regressive feedback in the high-dimensional setting where the number of samples available is much less than the number of parameters. This problem arises in learning interconnections of networks of dynamical systems with spiking or binary-valued data. We allow the process to depend on its past up to a lag $p$, for a general $p \ge 1$, allowing for more realistic modeling in many applications. We propose and analyze an $\ell_1$-regularized maximum likelihood estimator (MLE) under the assumption that the parameter tensor is approximately sparse. Rigorous analysis of such estimators is made challenging by the dependent and non-Gaussian nature of the process as well as the presence of the nonlinearities and multi-level feedback. We derive precise upper bounds on the mean-squared estimation error in terms of the number of samples, dimensions of the process, the lag $p$ and other key statistical properties of the model. The ideas presented can be used in the high-dimensional analysis of regularized $M$-estimators for other sparse nonlinear and non-Gaussian processes with long-range dependence.


Human-in-the-loop Active Covariance Learning for Improving Prediction in Small Data Sets

arXiv.org Machine Learning

Learning predictive models from small high-dimensional data sets is a key problem in high-dimensional statistics. Expert knowledge elicitation can help, and a strong line of work focuses on directly eliciting informative prior distributions for parameters. This either requires considerable statistical expertise or is laborious, as the emphasis has been on accuracy and not on efficiency of the process. Another line of work queries about importance of features one at a time, assuming them to be independent and hence missing covariance information. In contrast, we propose eliciting expert knowledge about pairwise feature similarities, to borrow statistical strength in the predictions, and using sequential decision making techniques to minimize the effort of the expert. Empirical results demonstrate improvement in predictive performance on both simulated and real data, in high-dimensional linear regression tasks, where we learn the covariance structure with a Gaussian process, based on sequential elicitation.


Doubly Semi-Implicit Variational Inference

arXiv.org Machine Learning

We extend the existing framework of semi-implicit variational inference (SIVI) and introduce doubly semi-implicit variational inference (DSIVI), a way to perform variational inference and learning when both the approximate posterior and the prior distribution are semi-implicit. In other words, DSIVI performs inference in models where the prior and the posterior can be expressed as an intractable infinite mixture of some analytic density with a highly flexible implicit mixing distribution. We provide a sandwich bound on the evidence lower bound (ELBO) objective that can be made arbitrarily tight. Unlike discriminator-based and kernel-based approaches to implicit variational inference, DSIVI optimizes a proper lower bound on ELBO that is asymptotically exact. We evaluate DSIVI on a set of problems that benefit from implicit priors. In particular, we show that DSIVI gives rise to a simple modification of VampPrior, the current state-of-the-art prior for variational autoencoders, which improves its performance.