Goto

Collaborating Authors

 Bayesian Learning


Stein Variational Online Changepoint Detection with Applications to Hawkes Processes and Neural Networks

arXiv.org Machine Learning

Bayesian online changepoint detection (BOCPD) (Adams & MacKay, 2007) offers a rigorous and viable way to identity changepoints in complex systems. In this work, we introduce a Stein variational online changepoint detection (SVOCD) method to provide a computationally tractable generalization of BOCPD beyond the exponential family of probability distributions. We integrate the recently developed Stein variational Newton (SVN) method (Detommaso et al., 2018) and BOCPD to offer a full online Bayesian treatment for a large number of situations with significant importance in practice. We apply the resulting method to two challenging and novel applications: Hawkes processes and long short-term memory (LSTM) neural networks. In both cases, we successfully demonstrate the efficacy of our method on real data.


Trust Region Value Optimization using Kalman Filtering

arXiv.org Machine Learning

Policy evaluation is a key process in reinforcement learning. It assesses a given policy using estimation of the corresponding value function. When using a parameterized function to approximate the value, it is common to optimize the set of parameters by minimizing the sum of squared Bellman Temporal Differences errors. However, this approach ignores certain distributional properties of both the errors and value parameters. Taking these distributions into account in the optimization process can provide useful information on the amount of confidence in value estimation. In this work we propose to optimize the value by minimizing a regularized objective function which forms a trust region over its parameters. We present a novel optimization method, the Kalman Optimization for Value Approximation (KOVA), based on the Extended Kalman Filter. KOVA minimizes the regularized objective function by adopting a Bayesian perspective over both the value parameters and noisy observed returns. This distributional property provides information on parameter uncertainty in addition to value estimates. We provide theoretical results of our approach and analyze the performance of our proposed optimizer on domains with large state and action spaces.


Neutron drip line in the Ca region from Bayesian model averaging

arXiv.org Machine Learning

The region of heavy calcium isotopes forms the frontier of experimental and theoretical nuclear structure research where the basic concepts of nuclear physics are put to stringent test. The recent discovery of the extremely neutron-rich nuclei around $^{60}$Ca [Tarasov, 2018] and the experimental determination of masses for $^{55-57}$Ca (Michimasa, 2018] provide unique information about the binding energy surface in this region. To assess the impact of these experimental discoveries on the nuclear landscape's extent, we use global mass models and statistical machine learning to make predictions, with quantified levels of certainty, for bound nuclides between Si and Ti. Using a Bayesian model averaging analysis based on Gaussian-process-based extrapolations we introduce the posterior probability $p_{ex}$ for each nucleus to be bound to neutron emission. We find that extrapolations for drip-line locations, at which the nuclear binding ends, are consistent across the global mass models used, in spite of significant variations between their raw predictions. In particular, considering the current experimental information and current global mass models, we predict that $^{68}$Ca has an average posterior probability ${p_{ex}\approx76}$% to be bound to two-neutron emission while the nucleus $^{61}$Ca is likely to decay by emitting a neutron (${p_{ex}\approx 46}$ %).


Online Estimation of Multiple Dynamic Graphs in Pattern Sequences

arXiv.org Machine Learning

Many time-series data including text, movies, and biological signals can be represented as sequences of correlated binary patterns. These patterns may be described by weighted combinations of a few dominant structures that underpin specific interactions among the binary elements. To extract the dominant correlation structures and their contributions to generating data in a time-dependent manner, we model the dynamics of binary patterns using the state-space model of an Ising-type network that is composed of multiple undirected graphs. We provide a sequential Bayes algorithm to estimate the dynamics of weights on the graphs while gaining the graph structures online. This model can uncover overlapping graphs underlying the data better than a traditional orthogonal decomposition method, and outperforms an original time-dependent full Ising model. We assess the performance of the method by simulated data, and demonstrate that spontaneous activity of cultured hippocampal neurons is represented by dynamics of multiple graphs.


10 Major Machine Learning Algorithms And Their Application

#artificialintelligence

Algorithms are the smart and powerful soldier of a complex machine learning model. In other words, machine learning algorithms are the core foundation when we play with data or when it's come to training the model. In this article, you and I are going on a tour called "7 major machine learning algorithms and their application " The purpose of this tour is to either brush up the mind or to gain an essential understanding of machine learning algorithm. We will find the major answer in this tour like for what purpose machine learning algorithms works, where to use them, when to use them and how to use them. Before getting deeper let's have a brief introduction. Machine learning algorithms are mainly classified into 3 broad categories i.e supervised learning, unsupervised learning, and reinforcement learning. In supervised learning machine learning algorithms, the machine is taught by example. Here the operator provides the machine learning algorithm with the dataset. This dataset includes desired inputs and outputs variables. By the use of these set of variables, we generate a function that map inputs to desired outputs.


Calibration with Bias-Corrected Temperature Scaling Improves Domain Adaptation Under Label Shift in Modern Neural Networks

arXiv.org Machine Learning

Label shift refers to the phenomenon where the marginal probability p(y) of observing a particular class changes between the training and test distributions while the conditional probability p(x|y) stays fixed. This is relevant in settings such as medical diagnosis, where a classifier trained to predict disease based on observed symptoms may need to be adapted to a different distribution where the baseline frequency of the disease is higher. Given calibrated estimates of p(y|x), one can apply an EM algorithm to correct for the shift in class imbalance between the training and test distributions without ever needing to calculate p(x|y). Unfortunately, modern neural networks typically fail to produce well-calibrated probabilities, compromising the effectiveness of this approach. Although Temperature Scaling can greatly reduce miscalibration in these networks, it can leave behind a systematic bias in the probabilities that still poses a problem. To address this, we extend Temperature Scaling with class-specific bias parameters, which largely eliminates systematic bias in the calibrated probabilities and allows for effective domain adaptation under label shift. We term our calibration approach "Bias-Corrected Temperature Scaling". On experiments with CIFAR10, we find that EM with Bias-Corrected Temperature Scaling significantly outperforms both EM with Temperature Scaling and the recently-proposed Black-Box Shift Estimation.


Fitting A Mixture Distribution to Data: Tutorial

arXiv.org Machine Learning

This paper is a step-by-step tutorial for fitting a mixture distribution to data. It merely assumes the reader has the background of calculus and linear algebra. Other required background is briefly reviewed before explaining the main algorithm. In explaining the main algorithm, first, fitting a mixture of two distributions is detailed and examples of fitting two Gaussians and Poissons, respectively for continuous and discrete cases, are introduced. Thereafter, fitting several distributions in general case is explained and examples of several Gaussians (Gaussian Mixture Model) and Poissons are again provided. Model-based clustering, as one of the applications of mixture distributions, is also introduced. Numerical simulations are also provided for both Gaussian and Poisson examples for the sake of better clarification.


Context Aware Machine Learning

arXiv.org Machine Learning

We propose a principle for exploring context in machine learning models. Starting with a simple assumption that each observation (random variables) may or may not depend on its context (conditional variables), a conditional probability distribution is decomposed into two parts: context-free and context-sensitive. Then by employing the log-linear word production model for relating random variables to their embedding space representation and making use of the convexity of natural exponential function, we show that the embedding of an observation can also be decomposed into a weighted sum of two vectors, representing its context-free and context-sensitive parts, respectively. This simple treatment of context provides a unified view of many existing deep learning models, leading to revisions of these models able to achieve significant performance boost. Specifically, our upgraded version of a recent sentence embedding model (Arora et al., 2017) not only outperforms the original one by a large margin, but also leads to a new, principled approach for compositing the embeddings of bag-of-words features, as well as a new architecture for modeling attention in deep neural networks. More surprisingly, our new principle provides a novel understanding of the gates and equations defined by the long short term memory (LSTM) model, which also leads to a new model that is able to converge significantly faster and achieve much lower prediction errors. Furthermore, our principle also inspires a new type of generic neural network layer that better resembles real biological neurons than the traditional linear mapping plus nonlinear activation based architecture. Its multi-layer extension provides a new principle for deep neural networks which subsumes residual network (ResNet) as its special case, and its extension to convolutional neutral network model accounts for irrelevant input (e.g., background in an image) in addition to filtering. Our models are validated through a series of benchmark datasets and we show that in many cases, simply replacing existing layers with our context-aware counterparts is sufficient to significantly improve the results.


Physics-Constrained Deep Learning for High-dimensional Surrogate Modeling and Uncertainty Quantification without Labeled Data

arXiv.org Machine Learning

Surrogate modeling and uncertainty quantification tasks for PDE systems are most often considered as supervised learning problems where input and output data pairs are used for training. The construction of such emulators is by definition a small data problem which poses challenges to deep learning approaches that have been developed to operate in the big data regime. Even in cases where such models have been shown to have good predictive capability in high dimensions, they fail to address constraints in the data implied by the PDE model. This paper provides a methodology that incorporates the governing equations of the physical model in the loss/likelihood functions. The resulting physics-constrained, deep learning models are trained without any labeled data (e.g. employing only input data) and provide comparable predictive responses with data-driven models while obeying the constraints of the problem at hand. This work employs a convolutional encoder-decoder neural network approach as well as a conditional flow-based generative model for the solution of PDEs, surrogate model construction, and uncertainty quantification tasks. The methodology is posed as a minimization problem of the reverse Kullback-Leibler (KL) divergence between the model predictive density and the reference conditional density, where the later is defined as the Boltzmann-Gibbs distribution at a given inverse temperature with the underlying potential relating to the PDE system of interest. The generalization capability of these models to out-of-distribution input is considered. Quantification and interpretation of the predictive uncertainty is provided for a number of problems.


A combined entropy and utility based generative model for large scale multiple discrete-continuous travel behaviour data

arXiv.org Machine Learning

Generative models, either by simple clustering algorithms or deep neural network architecture, have been developed as a probabilistic estimation method for dimension reduction or to model the underlying properties of data structures. Although their apparent use has largely been limited to image recognition and classification, generative machine learning algorithms can be a powerful tool for travel behaviour research. In this paper, we examine the generative machine learning approach for analyzing multiple discrete-continuous (MDC) travel behaviour data to understand the underlying heterogeneity and correlation, increasing the representational power of such travel behaviour models. We show that generative models are conceptually similar to choice selection behaviour process through information entropy and variational Bayesian inference. Specifically, we consider a restricted Boltzmann machine (RBM) based algorithm with multiple discrete-continuous layer, formulated as a variational Bayesian inference optimization problem. We systematically describe the proposed machine learning algorithm and develop a process of analyzing travel behaviour data from a generative learning perspective. We show parameter stability from model analysis and simulation tests on an open dataset with multiple discrete-continuous dimensions and a size of 293,330 observations. For interpretability, we derive analytical methods for conditional probabilities as well as elasticities. Our results indicate that latent variables in generative models can accurately represent joint distribution consistently w.r.t multiple discrete-continuous variables. Lastly, we show that our model can generate statistically similar data distributions for travel forecasting and prediction.