Goto

Collaborating Authors

 Bayesian Inference


Shallow Neural Hawkes: Non-parametric kernel estimation for Hawkes processes

arXiv.org Machine Learning

Multi-dimensional Hawkes process (MHP) is a class of self and mutually exciting point processes that find wide range of applications -- from prediction of earthquakes to modelling of order books in high frequency trading. This paper makes two major contributions, we first find an unbiased estimator for the log-likelihood estimator of the Hawkes process to enable efficient use of the stochastic gradient descent method for maximum likelihood estimation. The second contribution is, we propose a specific single hidden layered neural network for the non-parametric estimation of the underlying kernels of the MHP. We evaluate the proposed model on both synthetic and real datasets, and find the method has comparable or better performance than existing estimation methods. The use of shallow neural network ensures that we do not compromise on the interpretability of the Hawkes model, while at the same time have the flexibility to estimate any non-standard Hawkes excitation kernel.


You say Normalizing Flows I see Bayesian Networks

arXiv.org Machine Learning

Normalizing flows have emerged as an important family of deep neural networks for modelling complex probability distributions. In this note, we revisit their coupling and autoregressive transformation layers as probabilistic graphical models and show that they reduce to Bayesian networks with a pre-defined topology and a learnable density at each node. From this new perspective, we provide three results. First, we show that stacking multiple transformations in a normalizing flow relaxes independence assumptions and entangles the model distribution. Second, we show that a fundamental leap of capacity emerges when the depth of affine flows exceeds 3 transformation layers. Third, we prove the non-universality of the affine normalizing flow, regardless of its depth.


Adaptive quadrature schemes for Bayesian inference via active learning

arXiv.org Machine Learning

Numerical integration and emulation are fundamental topics across scientific fields. We propose novel adaptive quadrature schemes based on an active learning procedure. We consider an interpolative approach for building a surrogate posterior density, combining it with Monte Carlo sampling methods and other quadrature rules. The nodes of the quadrature are sequentially chosen by maximizing a suitable acquisition function, which takes into account the current approximation of the posterior and the positions of the nodes. This maximization does not require additional evaluations of the true posterior. We introduce two specific schemes based on Gaussian and Nearest Neighbors (NN) bases. For the Gaussian case, we also provide a novel procedure for fitting the bandwidth parameter, in order to build a suitable emulator of a density function. With both techniques, we always obtain a positive estimation of the marginal likelihood (a.k.a., Bayesian evidence). An equivalent importance sampling interpretation is also described, which allows the design of extended schemes. Several theoretical results are provided and discussed. Numerical results show the advantage of the proposed approach, including a challenging inference problem in an astronomic dynamical model, with the goal of revealing the number of planets orbiting a star.


Variational Mutual Information Maximization Framework for VAE Latent Codes with Continuous and Discrete Priors

arXiv.org Machine Learning

Learning interpretable and disentangled representations of data is a key topic in machine learning research. Variational Autoencoder (VAE) is a scalable method for learning directed latent variable models of complex data. It employs a clear and interpretable objective that can be easily optimized. However, this objective does not provide an explicit measure for the quality of latent variable representations which may result in their poor quality. We propose Variational Mutual Information Maximization Framework for VAE to address this issue. In comparison to other methods, it provides an explicit objective that maximizes lower bound on mutual information between latent codes and observations. The objective acts as a regularizer that forces VAE to not ignore the latent variable and allows one to select particular components of it to be most informative with respect to the observations. On top of that, the proposed framework provides a way to evaluate mutual information between latent codes and observations for a fixed VAE model. We have conducted our experiments on VAE models with Gaussian and joint Gaussian and discrete latent variables. Our results illustrate that the proposed approach strengthens relationships between latent codes and observations and improves learned representations.


A probabilistic generative model for semi-supervised training of coarse-grained surrogates and enforcing physical constraints through virtual observables

arXiv.org Machine Learning

The data-centric construction of inexpensive surrogates for fine-grained, physical models has been at the forefront of computational physics due to its significant utility in many-query tasks such as uncertainty quantification. Recent efforts have taken advantage of the enabling technologies from the field of machine learning (e.g. deep neural networks) in combination with simulation data. While such strategies have shown promise even in higher-dimensional problems, they generally require large amounts of training data even though the construction of surrogates is by definition a Small Data problem. Rather than employing data-based loss functions, it has been proposed to make use of the governing equations (in the simplest case at collocation points) in order to imbue domain knowledge in the training of the otherwise black-box-like interpolators. The present paper provides a flexible, probabilistic framework that accounts for physical structure and information both in the training objectives as well as in the surrogate model itself. We advocate a probabilistic (Bayesian) model in which equalities that are available from the physics (e.g. residuals, conservation laws) can be introduced as virtual observables and can provide additional information through the likelihood. We further advocate a generative model i.e. one that attempts to learn the joint density of inputs and outputs that is capable of making use of unlabeled data (i.e. only inputs) in a semi-supervised fashion in order to promote the discovery of lower-dimensional embeddings which are nevertheless predictive of the fine-grained model's output.


Toward Optimal Probabilistic Active Learning Using a Bayesian Approach

arXiv.org Machine Learning

Gathering labeled data to train well-performing machine learning models is one of the critical challenges in many applications. Active learning aims at reducing the labeling costs by an efficient and effective allocation of costly labeling resources. In this article, we propose a decision-theoretic selection strategy that (1) directly optimizes the gain in misclassification error, and (2) uses a Bayesian approach by introducing a conjugate prior distribution to determine the class posterior to deal with uncertainties. By reformulating existing selection strategies within our proposed model, we can explain which aspects are not covered in current state-of-the-art and why this leads to the superior performance of our approach. Extensive experiments on a large variety of datasets and different kernels validate our claims.


Meta Learning as Bayes Risk Minimization

arXiv.org Machine Learning

We show that, when we cast meta-learning problem as BRM, the optimal solution Meta-Learning is a family of methods that use is given by the predictive distribution computed from a set of interrelated tasks to learn a model that the posterior distribution of the latent variable conditioned can quickly learn a new query task from a possibly against the contextual dataset. This result justifies the use of small contextual dataset. In this study, we the predictive distribution in many previous studies of meta use a probabilistic framework to formalize what learning, such as (Edwards & Storkey, 2017; Gordon et al., it means for two tasks to be related and reframe 2018; Garnelo et al., 2018). However, the optimality of the the meta-learning problem into the problem of predictive distribution cannot be guaranteed if one uses an Bayesian risk minimization (BRM). In our formulation, approximation of the posterior distribution that violates the the BRM optimal solution is given by the way the posterior distribution changes with the contextual predictive distribution computed from the posterior dataset, and this is unfortunately the case for most of the distribution of the task-specific latent variable aforementioned works. For example, the variance of the conditioned on the contextual dataset, and this posterior in these works do not converge to 0 as we take justifies the philosophy of Neural Process.


Fully probabilistic quasar continua predictions near Lyman-{\alpha} with conditional neural spline flows

arXiv.org Machine Learning

Measurement of the red damping wing of neutral hydrogen in quasar spectra provides a probe of the epoch of reionization in the early Universe. Such quantification requires precise and unbiased estimates of the intrinsic continua near Lyman-$\alpha$ (Ly$\alpha$), a challenging task given the highly variable Ly$\alpha$ emission profiles of quasars. Here, we introduce a fully probabilistic approach to intrinsic continua prediction. We frame the problem as a conditional density estimation task and explicitly model the distribution over plausible blue-side continua ($1190\ \unicode{xC5} \leq \lambda_{\text{rest}} < 1290\ \unicode{xC5}$) conditional on the red-side spectrum ($1290\ \unicode{xC5} \leq \lambda_{\text{rest}} < 2900\ \unicode{xC5}$) using normalizing flows. Our approach achieves state-of-the-art precision and accuracy, allows for sampling one thousand plausible continua in less than a tenth of a second, and can natively provide confidence intervals on the blue-side continua via Monte Carlo sampling. We measure the damping wing effect in two $z>7$ quasars and estimate the volume-averaged neutral fraction of hydrogen from each, finding $\bar{x}_\text{HI}=0.304 \pm 0.042$ for ULAS J1120+0641 ($z=7.09$) and $\bar{x}_\text{HI}=0.384 \pm 0.133$ for ULAS J1342+0928 ($z=7.54$).


Variational Bayesian Inference for Crowdsourcing Predictions

arXiv.org Artificial Intelligence

Crowdsourcing has emerged as an effective means for performing a number of machine learning tasks such as annotation and labelling of images and other data sets. In most early settings of crowdsourcing, the task involved classification, that is assigning one of a discrete set of labels to each task. Recently, however, more complex tasks have been attempted including asking crowdsource workers to assign continuous labels, or predictions. In essence, this involves the use of crowdsourcing for function estimation. We are motivated by this problem to drive applications such as collaborative prediction, that is, harnessing the wisdom of the crowd to predict quantities more accurately. To do so, we propose a Bayesian approach aimed specifically at alleviating overfitting, a typical impediment to accurate prediction models in practice. In particular, we develop a variational Bayesian technique for two different worker noise models - one that assumes workers' noises are independent and the other that assumes workers' noises have a latent low-rank structure. Our evaluations on synthetic and real-world datasets demonstrate that these Bayesian approaches perform significantly better than existing non-Bayesian approaches and are thus potentially useful for this class of crowdsourcing problems.


Sampling Techniques in Bayesian Target Encoding

arXiv.org Machine Learning

Target encoding is an effective encoding technique of categorical variables and is often used in machine learning systems for processing tabular data sets with mixed numeric and categorical variables. Recently en enhanced version of this encoding technique was proposed by using conjugate Bayesian modeling. This paper presents a further development of Bayesian encoding method by using sampling techniques, which helps in extracting information from intra-category distribution of the target variable, improves generalization and reduces target leakage.