Goto

Collaborating Authors

 Uncertainty


Uncertainty Estimation via Stochastic Batch Normalization

arXiv.org Machine Learning

In this work, we investigate Batch Normalization technique and propose its probabilistic interpretation. We propose a probabilistic model and show that Batch Normalization maximazes the lower bound of its marginalized log-likelihood. Then, according to the new probabilistic model, we design an algorithm which acts consistently during train and test. However, inference becomes computationally inefficient. To reduce memory and computational cost, we propose Stochastic Batch Normalization -- an efficient approximation of proper inference procedure. This method provides us with a scalable uncertainty estimation technique. We demonstrate the performance of Stochastic Batch Normalization on popular architectures (including deep convolutional architectures: VGG-like and ResNets) for MNIST and CIFAR-10 datasets.


Predictor Variable Prioritization in Nonlinear Models: A Genetic Association Case Study

arXiv.org Machine Learning

The central aim in this paper is to address variable selection questions in nonlinear and nonparametric regression. Motivated by statistical genetics, where nonlinear interactions are of particular interest, we introduce a novel, interpretable, and computationally efficient way to summarize the relative importance of predictor variables. Methodologically, we develop the "RelATive cEntrality" (RATE) measure to prioritize candidate genetic variants that are not just marginally important, but whose associations also stem from significant covarying relationships with other variants in the data. We illustrate RATE through Bayesian Gaussian process regression, but the methodological innovations apply to other nonlinear methods. It is known that nonlinear models often exhibit greater predictive accuracy than linear models, particularly for phenotypes generated by complex genetic architectures. With detailed simulations and an Arabidopsis thaliana QTL mapping study, we show that applying RATE enables an explanation for this improved performance.


Copula Index for Detecting Dependence and Monotonicity between Stochastic Signals

arXiv.org Machine Learning

This paper introduces a nonparametric copula-based index for detecting the strength and monotonicity structure of linear and nonlinear statistical dependence between pairs of random variables or stochastic signals. Our index, termed Copula Index for Detecting Dependence and Monotonicity (CIM), satisfies several desirable properties of measures of association, including R\'enyi's properties, the data processing inequality (DPI), and consequently self-equitability. Synthetic data simulations reveal that the statistical power of CIM compares favorably to other state-of-the-art measures of association that are proven to satisfy the DPI. Simulation results with real-world data reveal the CIM's unique ability to detect the monotonicity structure among stochastic signals to find interesting dependencies in large datasets. Additionally, simulations show that the CIM shows favorable performance to estimators of mutual information when discovering Markov network structure.


Momentum-Space Renormalization Group Transformation in Bayesian Image Modeling by Gaussian Graphical Model

arXiv.org Machine Learning

A new Bayesian modeling method is proposed by combining the maximization of the marginal likelihood with a momentum-space renormalization group transformation for Gaussian graphical models. Moreover, we present a scheme for computint the statistical averages of hyperparameters and mean square errors in our proposed method based on a momentumspace renormalization transformation.


Learning non-Gaussian Time Series using the Box-Cox Gaussian Process

arXiv.org Machine Learning

A Gaussian process (GP) [1] is a prior distribution over functions with a support that includes a wide class of phenomena via the design of its mean and covariance functions, the parameters of which provide meaningful interpretation of the process at hand. Beyond regression [2], GPs have been extensively used in the last two decades for classification [3], density estimation [4], filter design [5], model identification [6] and optimisation [7]. In general terms, all these generative models have two stages: The latent process is modelled as a GP and the observation is modelled (conditional to the latent process) as a non-Gaussian variable. This class of models is referred to as GP with non-Gaussian likelihood, or as Generalised GPs. These usually consider likelihood functions from the exponential family such as the Laplace, Poisson, beta and gamma distributions [8]. A well-known example is the GP classification model, where the classes are represented by the output of an activation neuron into which a latent GP is fed. A slightly different approach to non-Gaussian models, which is not constrained to the exponential family, is the warped GP (WGP, [9]). The WGP models non-Gaussian data by assuming that there is a transformation ฯ† such that the observations can be passed through ฯ† to yield a GP, therefore, the likelihood function of this model is not designed directly but, rather, induced by the transformation (a.k.a.


Basics of Bayesian Decision Theory

@machinelearnbot

The use of formal statistical methods to analyse quantitative data in data science has increased considerably over the last few years. One such approach, Bayesian Decision Theory (BDT), also known as Bayesian Hypothesis Testing and Bayesian inference, is a fundamental statistical approach that quantifies the tradeoffs between various decisions using distributions and costs that accompany such decisions. In pattern recognition it is used for designing classifiers making the assumption that the problem is posed in probabilistic terms, and that all of the relevant probability values are known. Generally, we don't have such perfect information but it is a good place to start when studying machine learning, statistical inference, and detection theory in signal processing. BDT also has many applications in science, engineering, and medicine.


Topology Estimation using Graphical Models in Multi-Phase Power Distribution Grids

arXiv.org Machine Learning

Distribution grid is the medium and low voltage part of a large power system. Structurally, the majority of distribution networks operate radially, such that energized lines form a collection of trees, i.e. forest, with a substation being at the root of any tree. The operational topology/forest may change from time to time, however tracking these changes, even though important for the distribution grid operation and control, is hindered by limited real-time monitoring. This paper develops a learning framework to reconstruct radial operational structure of the distribution grid from synchronized voltage measurements in the grid subject to the exogenous fluctuations in nodal power consumption. To detect operational lines our learning algorithm uses conditional independence tests for continuous random variables that is applicable to a wide class of probability distributions of the nodal consumption and Gaussian injections in particular. Moreover, our algorithm applies to the practical case of unbalanced three-phase power flow. Algorithm performance is validated on AC power flow simulations over IEEE distribution grid test cases.


Convergence Rates of Latent Topic Models Under Relaxed Identifiability Conditions

arXiv.org Machine Learning

In this paper we study the frequentist convergence rate for the Latent Dirichlet Allocation (Blei et al., 2003) topic models. We show that the maximum likelihood estimator converges to one of the finitely many equivalent parameters in Wasserstein's distance metric at a rate of $n^{-1/4}$ without assuming separability or non-degeneracy of the underlying topics and/or the existence of more than three words per document, thus generalizing the previous works of Anandkumar et al. (2012, 2014) from an information-theoretical perspective. We also show that the $n^{-1/4}$ convergence rate is optimal in the worst case.


Generative Bridging Network in Neural Sequence Prediction

arXiv.org Machine Learning

In order to alleviate data sparsity and over-fitting problems in maximum likelihood estimation (MLE) for sequence prediction tasks, we propose the Generative Bridging Network (GBN), in which a novel bridge module is introduced to assist the training of the sequence prediction model (the generator network). Unlike MLE directly maximizing the conditional likelihood, the bridge extends the point-wise ground truth to a bridge distribution conditioned on it, and the generator is optimized to minimize their KL-divergence. Three different GBNs, namely uniform GBN, language-model GBN and coaching GBN, are proposed to penalize confidence, enhance language smoothness and relieve learning burden. Experiments conducted on two recognized sequence prediction tasks (machine translation and abstractive text summarization) show that our proposed GBNs can yield significant improvements over strong baselines. Furthermore, by analyzing samples drawn from different bridges, expected influences on the generator are verified.


Impacts of Dirty Data: and Experimental Evaluation

arXiv.org Machine Learning

Data quality issues have attracted widespread attention due to the negative impacts of dirty data on data mining and machine learning results. The relationship between data quality and the accuracy of results could be applied on the selection of the appropriate algorithm with the consideration of data quality and the determination of the data share to clean. However, rare research has focused on exploring such relationship. Motivated by this, this paper conducts an experimental comparison for the effects of missing, inconsistent and conflicting data on classification, clustering, and regression algorithms. Based on the experimental findings, we provide guidelines for algorithm selection and data cleaning.