Goto

Collaborating Authors

 Learning Graphical Models


Minimax Optimal Bayesian Aggregation

arXiv.org Machine Learning

It is generally believed that ensemble approaches, which combine multiple algorithms or models, can outperform any single algorithm at machine learning tasks, such as prediction. In this paper, we propose Bayesian convex and linear aggregation approaches motivated by regression applications. We show that the proposed approach is minimax optimal when the true data-generating model is a convex or linear combination of models in the list. Moreover, the method can adapt to sparsity structure in which certain models should receive zero weights, and the method is tuning parameter free unlike competitors. More generally, under an M-open view when the truth falls outside the space of all convex/linear combinations, our theory suggests that the posterior measure tends to concentrate on the best approximation of the truth at the minimax rate. We illustrate the method through simulation studies and several applications.


Object Tracking via Non-Euclidean Geometry: A Grassmann Approach

arXiv.org Machine Learning

A robust visual tracking system requires an object appearance model that is able to handle occlusion, pose, and illumination variations in the video stream. This can be difficult to accomplish when the model is trained using only a single image. In this paper, we first propose a tracking approach based on affine subspaces (constructed from several images) which are able to accommodate the abovementioned variations. We use affine subspaces not only to represent the object, but also the candidate areas that the object may occupy. We furthermore propose a novel approach to measure affine subspace-to-subspace distance via the use of non-Euclidean geometry of Grassmann manifolds. The tracking problem is then considered as an inference task in a Markov Chain Monte Carlo framework via particle filtering. Quantitative evaluation on challenging video sequences indicates that the proposed approach obtains considerably better performance than several recent state-of-the-art methods such as Tracking-Learning-Detection and MILtrack.


Robust Asymmetric Clustering

arXiv.org Machine Learning

Contaminated mixture models are developed for model-based clustering of data with asymmetric clusters as well as spurious points, outliers, and/or noise. Specifically, we introduce a contaminated mixture of contaminated shifted asymmetric Laplace distributions and a contaminated mixture of contaminated skew-normal distributions. In each case, mixture components have a parameter controlling the proportion of bad points (i.e., spurious points, outliers, and/or noise) and one specifying the degree of contamination. A very important feature of our approaches is that these parameters do not have to be specified a priori. Expectation-conditional maximization algorithms are outlined for parameter estimation and the number of components is selected using the Bayesian information criterion. The performance of our approaches is illustrated on artificial and real data.


Bayesian Sample Size Determination of Vibration Signals in Machine Learning Approach to Fault Diagnosis of Roller Bearings

arXiv.org Machine Learning

Sample size determination for a data set is an important statistical process for analyzing the data to an optimum level of accuracy and using minimum computational work. The applications of this process are credible in every domain which deals with large data sets and high computational work. This study uses Bayesian analysis for determination of minimum sample size of vibration signals to be considered for fault diagnosis of a bearing using pre-defined parameters such as the inverse standard probability and the acceptable margin of error. Thus an analytical formula for sample size determination is introduced. The fault diagnosis of the bearing is done using a machine learning approach using an entropy-based J48 algorithm. The following method will help researchers involved in fault diagnosis to determine minimum sample size of data for analysis for a good statistical stability and precision.


The IBMAP approach for Markov networks structure learning

arXiv.org Artificial Intelligence

In this work we consider the problem of learning the structure of Markov networks from data. We present an approach for tackling this problem called IBMAP, together with an efficient instantiation of the approach: the IBMAP-HC algorithm, designed for avoiding important limitations of existing independence-based algorithms. These algorithms proceed by performing statistical independence tests on data, trusting completely the outcome of each test. In practice tests may be incorrect, resulting in potential cascading errors and the consequent reduction in the quality of the structures learned. IBMAP contemplates this uncertainty in the outcome of the tests through a probabilistic maximum-a-posteriori approach. The approach is instantiated in the IBMAP-HC algorithm, a structure selection strategy that performs a polynomial heuristic local search in the space of possible structures. We present an extensive empirical evaluation on synthetic and real data, showing that our algorithm outperforms significantly the current independence-based algorithms, in terms of data efficiency and quality of learned structures, with equivalent computational complexities. We also show the performance of IBMAP-HC in a real-world application of knowledge discovery: EDAs, which are evolutionary algorithms that use structure learning on each generation for modeling the distribution of populations. The experiments show that when IBMAP-HC is used to learn the structure, EDAs improve the convergence to the optimum.


Bayesian Inference for NMR Spectroscopy with Applications to Chemical Quantification

arXiv.org Machine Learning

Nuclear magnetic resonance (NMR) spectroscopy exploits the magnetic properties of atomic nuclei to discover the structure, reaction state and chemical environment of molecules. We propose a probabilistic generative model and inference procedures for NMR spectroscopy. Specifically, we use a weighted sum of trigonometric functions undergoing exponential decay to model free induction decay (FID) signals. We discuss the challenges in estimating the components of this general model -- amplitudes, phase shifts, frequencies, decay rates, and noise variances -- and offer practical solutions. We compare with conventional Fourier transform spectroscopy for estimating the relative concentrations of chemicals in a mixture, using synthetic and experimentally acquired FID signals. We find the proposed model is particularly robust to low signal to noise ratios (SNR), and overlapping peaks in the Fourier transform of the FID, enabling accurate predictions (e.g., 1% sensitivity at low SNR) which are not possible with conventional spectroscopy (5% sensitivity).


Accelerating ABC methods using Gaussian processes

arXiv.org Machine Learning

Approximate Bayesian computation (ABC) methods are used to approximate posterior distributions using simulation rather than likelihood calculations. We introduce Gaussian process (GP) accelerated ABC, which we show can significantly reduce the number of simulations required. As computational resource is usually the main determinant of accuracy in ABC, GP-accelerated methods can thus enable more accurate inference in some models. GP models of the unknown log-likelihood function are used to exploit continuity and smoothness, reducing the required computation. We use a sequence of models that increase in accuracy, using intermediate models to rule out regions of the parameter space as implausible. The methods will not be suitable for all problems, but when they can be used, can result in significant computational savings. For the Ricker model, we are able to achieve accurate approximations to the posterior distribution using a factor of 100 fewer simulator evaluations than comparable Monte Carlo approaches, and for a population genetics model we are able to approximate the exact posterior for the first time.


Scaling Nonparametric Bayesian Inference via Subsample-Annealing

arXiv.org Machine Learning

We describe an adaptation of the simulated annealing algorithm to nonparametric clustering and related probabilistic models. This new algorithm learns nonparametric latent structure over a growing and constantly churning subsample of training data, where the portion of data subsampled can be interpreted as the inverse temperature beta(t) in an annealing schedule. Gibbs sampling at high temperature (i.e., with a very small subsample) can more quickly explore sketches of the final latent state by (a) making longer jumps around latent space (as in block Gibbs) and (b) lowering energy barriers (as in simulated annealing). We prove subsample annealing speeds up mixing time N^2 -> N in a simple clustering model and exp(N) -> N in another class of models, where N is data size. Empirically subsample-annealing outperforms naive Gibbs sampling in accuracy-per-wallclock time, and can scale to larger datasets and deeper hierarchical models. We demonstrate improved inference on million-row subsamples of US Census data and network log data and a 307-row hospital rating dataset, using a Pitman-Yor generalization of the Cross Categorization model.


Le Cam meets LeCun: Deficiency and Generic Feature Learning

arXiv.org Machine Learning

"Deep Learning" methods attempt to learn generic features in an unsupervised fashion from a large unlabelled data set. These generic features should perform as well as the best hand crafted features for any learning problem that makes use of this data. We provide a definition of generic features, characterize when it is possible to learn them and provide algorithms closely related to the deep belief network and autoencoders of deep learning. In order to do so we use the notion of deficiency distance and illustrate its value in studying certain general learning problems.


Asymptotic Accuracy of Distribution-Based Estimation for Latent Variables

arXiv.org Machine Learning

Hierarchical statistical models are widely employed in information science and data engineering. The models consist of two types of variables: observable variables that represent the given data and latent variables for the unobservable labels. An asymptotic analysis of the models plays an important role in evaluating the learning process; the result of the analysis is applied not only to theoretical but also to practical situations, such as optimal model selection and active learning. There are many studies of generalization errors, which measure the prediction accuracy of the observable variables. However, the accuracy of estimating the latent variables has not yet been elucidated. For a quantitative evaluation of this, the present paper formulates distribution-based functions for the errors in the estimation of the latent variables. The asymptotic behavior is analyzed for both the maximum likelihood and the Bayes methods.