Goto

Collaborating Authors

 Bayesian Inference


Mining for Dark Matter Substructure: Inferring subhalo population properties from strong lenses with machine learning

arXiv.org Machine Learning

The subtle and unique imprint of dark matter substructure on extended arcs in strong lensing systems contains a wealth of information about the properties and distribution of dark matter on small scales and, consequently, about the underlying particle physics. However, teasing out this effect poses a significant challenge since the likelihood function for realistic simulations of population-level parameters is intractable. We apply recently-developed simulation-based inference techniques to the problem of substructure inference in galaxy-galaxy strong lenses. By leveraging additional information extracted from the simulator, neural networks are efficiently trained to estimate likelihood ratios associated with population-level parameters characterizing substructure. Through proof-of-principle application to simulated data, we show that these methods can provide an efficient and principled way to simultaneously analyze an ensemble of strong lenses, and can be used to mine the large sample of lensing images deliverable by near-future surveys for signatures of dark matter substructure.


Efron-Stein PAC-Bayesian Inequalities

arXiv.org Machine Learning

We prove semi-empirical concentration inequalities for random variables which are given as possibly nonlinear functions of independent random variables. These inequalities characterize the concentration of the random variable in terms of the data/distribution-dependent Efron-Stein (ES) estimate of its variance and they do not require any additional assumptions on the moments. In particular, this allows us to state semi-empirical Bernstein inequalities for general functions of unbounded random variables, which gives user-friendly concentration bounds for cases where related methods (entropy method / bounded differences) might be more challenging to apply. We extend these results to Efron-Stein PAC-Bayesian inequalities which hold for arbitrary probability kernels that define a random, data-dependent choice of the function of interest. Finally, we demonstrate a number of applications, including PAC-Bayesian generalization bounds for unbounded loss functions, empirical Bernstein-type generalization bounds, new truncation-free bounds for off-policy evaluation with Weighted Importance Sampling (WIS), and off-policy PAC-Bayesian learning with WIS.


Stochastic quasi-Newton with line-search regularization

arXiv.org Machine Learning

In this paper we present a novel quasi-Newton algorithm for use in stochastic optimisation. Quasi-Newton methods have had an enormous impact on deterministic optimisation problems because they afford rapid convergence and computationally attractive algorithms. In essence, this is achieved by learning the second-order (Hessian) information based on observing first-order gradients. We extend these ideas to the stochastic setting by employing a highly flexible model for the Hessian and infer its value based on observing noisy gradients. In addition, we propose a stochastic counterpart to standard line-search procedures and demonstrate the utility of this combination on maximum likelihood identification for general nonlinear state space models.


Pathologies of Factorised Gaussian and MC Dropout Posteriors in Bayesian Neural Networks

arXiv.org Machine Learning

Neural networks provide state-of-the-art performance on a variety of tasks. However, they are often overconfident when making predictions. This inability to properly account for uncertainty limits their application to high-risk decision making, active learning and Bayesian optimisation. To address this, Bayesian inference has been proposed as a framework for improving uncertainty estimates. In practice, Bayesian neural networks rely on poorly understood approximations for computational tractability. We prove that two commonly used approximation methods, the factorised Gaussian assumption and Monte Carlo dropout, lead to pathological estimates of the predictive uncertainty in single hidden layer ReLU networks. This indicates that more flexible approximations are needed to obtain reliable uncertainty estimates.


Data Selection for Short Term load forecasting

arXiv.org Artificial Intelligence

Power load forecast with Machine Learning is a fairly mature application of artificial intelligence and it is indispensable in operation, control and planning. Data selection techniqies have been hardly used in this application. However, the use of such techniques could be beneficial provided the assumption that the data is identically distributed is clearly not true in load forecasting, but it is cyclostationary. In this work we present a fully automatic methodology to determine what are the most adequate data to train a predictor which is based on a full Bayesian probabilistic model. We assess the performance of the method with experiments based on real publicly available data recorded from several years in the United States of America.


Bayesian Machine Learning in Python: A/B Testing

#artificialintelligence

Link: Bayesian Machine Learning in Python: A/B Testing Udemy In this course, while we will do traditional A/B testing in order to appreciate its complexity, what we will eventually get to is the Bayesian machine learning way of doing things. First, we'll see if we can improve on traditional A/B testing with adaptive methods. These all help you solve the explore-exploit dilemma. Bestseller Created by Lazy Programmer Inc What you'll learn Use adaptive algorithms to improve A/B testing performance Understand the difference between Bayesian and frequentist statistics Apply Bayesian methods to A/B testing In this course, while we will do traditional A/B testing in order to appreciate its complexity, what we will eventually get to is the Bayesian machine learning way of doing things. First, we'll see if we can improve on traditional A/B testing with adaptive methods.


Deep Bayesian Unsupervised Source Separation Based on a Complex Gaussian Mixture Model

arXiv.org Machine Learning

This paper presents an unsupervised method that trains neural source separation by using only multichannel mixture signals. Conventional neural separation methods require a lot of supervised data to achieve excellent performance. Although multichannel methods based on spatial information can work without such training data, they are often sensitive to parameter initialization and degraded with the sources located close to each other. The proposed method uses a cost function based on a spatial model called a complex Gaussian mixture model (cGMM). This model has the time-frequency (TF) masks and direction of arrivals (DoAs) of sources as latent variables and is used for training separation and localization networks that respectively estimate these variables. This joint training solves the frequency permutation ambiguity of the spatial model in a unified deep Bayesian framework. In addition, the pre-trained network can be used not only for conducting monaural separation but also for efficiently initializing a multichannel separation algorithm. Experimental results with simulated speech mixtures showed that our method outperformed a conventional initialization method.


Deep Neural Network Ensembles against Deception: Ensemble Diversity, Accuracy and Robustness

arXiv.org Machine Learning

We develop a three - step diversity ensemble creation algorithm: (1) Creating a pool of candidate ensemble member models, or so called base models; (2) Creating a pool of candidate ensemble teams with their diversity scores higher than the pre - defined minimum diversity threshold; and (3) Developing robust ensemble consensus methods, which can effectively combine, rank and integrate predictions from members of an ensemble committee to produce high accuracy ensemble prediction output again st adversarial examples. D ifferent ensemble creation methods tend to have varying level of diversity. A. Creating Ensemble s of Type 1 diversity We want to construct a pool of N redundant DNN models trained on the same learning task as the base classifiers. Preferably, the best ensemble committee members are those base classifiers that are relatively diverse and have high individual test accuracy. T he type 1 diversity ensemble creation algorithm requires that every base model in the pool meet s the type 1 dive rsity and ha s high benign test accuracy comparable to that of the target model under protection. One approach is to add one member model to the pool at a time. Assume that we initialize the pool with a privately trained DNN model. We only allow the next mo del to be added to the pool if it is trained independently using different hyper - parameters or different neural network structures or algorithms and it meet s the high benign test accuracy requirement.


Bayes EMbedding (BEM): Refining Representation by Integrating Knowledge Graphs and Behavior-specific Networks

arXiv.org Machine Learning

Low-dimensional embeddings of knowledge graphs and behavior graphs have proved remarkably powerful in varieties of tasks, from predicting unobserved edges between entities to content recommendation. The two types of graphs can contain distinct and complementary information for the same entities/nodes. However, previous works focus either on knowledge graph embedding or behavior graph embedding while few works consider both in a unified way. Here we present BEM , a Bayesian framework that incorporates the information from knowledge graphs and behavior graphs. To be more specific, BEM takes as prior the pre-trained embeddings from the knowledge graph, and integrates them with the pre-trained embeddings from the behavior graphs via a Bayesian generative model. BEM is able to mutually refine the embeddings from both sides while preserving their own topological structures. To show the superiority of our method, we conduct a range of experiments on three benchmark datasets: node classification, link prediction, triplet classification on two small datasets related to Freebase, and item recommendation on a large-scale e-commerce dataset.


On the overestimation of widely applicable Bayesian information criterion

arXiv.org Machine Learning

A widely applicable Bayesian information criterion (Watanabe, 2013) is applicable for both regular and singular models in the model selection problem. This criterion tends to overestimate the log marginal likelihood. We identify an overestimating term of a widely applicable Bayesian information criterion. Adjustment of the term gives an asymptotically unbiased estimator of the leading two terms of asymptotic expansion of the log marginal likelihood. In numerical experiments on regular and singular models, the adjustment resulted in smaller bias than the original criterion.