Goto

Collaborating Authors

 Bayesian Inference


Unsupervised Detection and Tracking of Arbitrary Objects with Dependent Dirichlet Process Mixtures

arXiv.org Machine Learning

This paper proposes a technique for the unsupervised detection and tracking of arbitrary objects in videos. It is intended to reduce the need for detection and localization methods tailored to specific object types and serve as a general framework applicable to videos with varied objects, backgrounds, and image qualities. The technique uses a dependent Dirichlet process mixture (DDPM) known as the Generalized Polya Urn (GPUDDPM) to model image pixel data that can be easily and efficiently extracted from the regions in a video that represent objects. This paper describes a specific implementation of the model using spatial and color pixel data extracted via frame differencing and gives two algorithms for performing inference in the model to accomplish detection and tracking. This technique is demonstrated on multiple synthetic and benchmark video datasets that illustrate its ability to, without modification, detect and track objects with diverse physical characteristics moving over non-uniform backgrounds and through occlusion.


Probability Bracket Notation, Multivariable Systems and Static Bayesian Networks

arXiv.org Artificial Intelligence

Probability Bracket Notation (PBN) is applied to systems of multiple random variables for preliminary study of static Bayesian Networks (BN) and Probabilistic Graphic Models (PGM). The famous Student BN Example is explored to show the local independences and reasoning power of a BN. Software package Elvira is used to graphically display the student BN. Our investigation shows that PBN provides a consistent and convenient alternative to manipulate many expressions related to joint, marginal and conditional probability distributions in static BN.


Inference in Probabilistic Logic Programs with Continuous Random Variables

arXiv.org Artificial Intelligence

Probabilistic Logic Programming (PLP), exemplified by Sato and Kameya's PRISM, Poole's ICL, Raedt et al's ProbLog and Vennekens et al's LPAD, is aimed at combining statistical and logical knowledge representation and inference. A key characteristic of PLP frameworks is that they are conservative extensions to non-probabilistic logic programs which have been widely used for knowledge representation. PLP frameworks extend traditional logic programming semantics to a distribution semantics, where the semantics of a probabilistic logic program is given in terms of a distribution over possible models of the program. However, the inference techniques used in these works rely on enumerating sets of explanations for a query answer. Consequently, these languages permit very limited use of random variables with continuous distributions. In this paper, we present a symbolic inference procedure that uses constraints and represents sets of explanations without enumeration. This permits us to reason over PLPs with Gaussian or Gamma-distributed random variables (in addition to discrete-valued random variables) and linear equality constraints over reals. We develop the inference procedure in the context of PRISM; however the procedure's core ideas can be easily applied to other PLP languages as well. An interesting aspect of our inference procedure is that PRISM's query evaluation process becomes a special case in the absence of any continuous random variables in the program. The symbolic inference procedure enables us to reason over complex probabilistic models such as Kalman filters and a large subclass of Hybrid Bayesian networks that were hitherto not possible in PLP frameworks. (To appear in Theory and Practice of Logic Programming).


Automatic Relevance Determination in Nonnegative Matrix Factorization with the \beta-Divergence

arXiv.org Machine Learning

This paper addresses the estimation of the latent dimensionality in nonnegative matrix factorization (NMF) with the \beta-divergence. The \beta-divergence is a family of cost functions that includes the squared Euclidean distance, Kullback-Leibler and Itakura-Saito divergences as special cases. Learning the model order is important as it is necessary to strike the right balance between data fidelity and overfitting. We propose a Bayesian model based on automatic relevance determination in which the columns of the dictionary matrix and the rows of the activation matrix are tied together through a common scale parameter in their prior. A family of majorization-minimization algorithms is proposed for maximum a posteriori (MAP) estimation. A subset of scale parameters is driven to a small lower bound in the course of inference, with the effect of pruning the corresponding spurious components. We demonstrate the efficacy and robustness of our algorithms by performing extensive experiments on synthetic data, the swimmer dataset, a music decomposition example and a stock price prediction task.


Predicting human preferences using the block structure of complex social networks

arXiv.org Machine Learning

With ever-increasing available data, predicting individuals' preferences and helping them locate the most relevant information has become a pressing need. Understanding and predicting preferences is also important from a fundamental point of view, as part of what has been called a "new" computational social science. Here, we propose a novel approach based on stochastic block models, which have been developed by sociologists as plausible models of complex networks of social interactions. Our model is in the spirit of predicting individuals' preferences based on the preferences of others but, rather than fitting a particular model, we rely on a Bayesian approach that samples over the ensemble of all possible models. We show that our approach is considerably more accurate than leading recommender algorithms, with major relative improvements between 38% and 99% over industry-level algorithms. Besides, our approach sheds light on decision-making processes by identifying groups of individuals that have consistently similar preferences, and enabling the analysis of the characteristics of those groups.


Partial Gaussian Graphical Model Estimation

arXiv.org Machine Learning

For such Gaussian graphical models (GGMs), it is usually assumed that a given variable can bepredicted by a small numberof other variables. This assumption implies that the precision matrix is sparse. Therefore estimating Gaussian graphical model can be reduced to the problem of estimating a sparse precision matrix. One approach to sparse precision matrix estimation is covariance selection or neighborhood selection (Dempster, 1972; Meinshausen & Bühlmann, 2006), which tries to estimate each row (or column) of the precision matrix by predicting the corresponding variable using a sparse linear combination of other variables. An alternative formulation is maximum-likelihood estimation method that directly estimate the full precision matrix.


The Issue-Adjusted Ideal Point Model

arXiv.org Machine Learning

Legislative behavior centers around the votes made by lawmakers. These votes are captured in roll call data, a matrix with lawmakers in the rows and proposed legislation in the columns. We illustrate a sample of roll call votes for the United States Senate in Figure 1. The seminal work of Poole and Rosenthal (1985) introduced the ideal point model, using roll call data to infer the latent political positions of the lawmakers. The ideal point model is a latent factor model of binary data and an application of item-response theory (Lord 1980) to roll call data. It gives each lawmaker a latent political position along a single dimension and then uses these points (called the ideal points) in a model of the votes.


Bayesian Mixture Models for Frequent Itemset Discovery

arXiv.org Machine Learning

In binary-transaction data-mining, traditional frequent itemset mining often produces results which are not straightforward to interpret. To overcome this problem, probability models are often used to produce more compact and conclusive results, albeit with some loss of accuracy. Bayesian statistics have been widely used in the development of probability models in machine learning in recent years and these methods have many advantages, including their abilities to avoid overfitting. In this paper, we develop two Bayesian mixture models with the Dirichlet distribution prior and the Dirichlet process (DP) prior to improve the previous non-Bayesian mixture model developed for transaction dataset mining. We implement the inference of both mixture models using two methods: a collapsed Gibbs sampling scheme and a variational approximation algorithm. Experiments in several benchmark problems have shown that both mixture models achieve better performance than a non-Bayesian mixture model. The variational algorithm is the faster of the two approaches while the Gibbs sampling method achieves a more accurate results. The Dirichlet process mixture model can automatically grow to a proper complexity for a better approximation. Once the model is built, it can be very fast to query and run analysis on (typically 10 times faster than Eclat, as we will show in the experiment section). However, these approaches also show that mixture models underestimate the probabilities of frequent itemsets. Consequently, these models have a higher sensitivity but a lower specificity.


A Bayesian Nonparametric Approach to Image Super-resolution

arXiv.org Machine Learning

Super-resolution methods form high-resolution images from low-resolution images. In this paper, we develop a new Bayesian nonparametric model for super-resolution. Our method uses a beta-Bernoulli process to learn a set of recurring visual patterns, called dictionary elements, from the data. Because it is nonparametric, the number of elements found is also determined from the data. We test the results on both benchmark and natural images, comparing with several other models from the research literature. We perform large-scale human evaluation experiments to assess the visual quality of the results. In a first implementation, we use Gibbs sampling to approximate the posterior. However, this algorithm is not feasible for large-scale data. To circumvent this, we then develop an online variational Bayes (VB) algorithm. This algorithm finds high quality dictionaries in a fraction of the time needed by the Gibbs sampler.


Scaling Multidimensional Inference for Structured Gaussian Processes

arXiv.org Machine Learning

Exact Gaussian Process (GP) regression has O(N^3) runtime for data size N, making it intractable for large N. Many algorithms for improving GP scaling approximate the covariance with lower rank matrices. Other work has exploited structure inherent in particular covariance functions, including GPs with implied Markov structure, and equispaced inputs (both enable O(N) runtime). However, these GP advances have not been extended to the multidimensional input setting, despite the preponderance of multidimensional applications. This paper introduces and tests novel extensions of structured GPs to multidimensional inputs. We present new methods for additive GPs, showing a novel connection between the classic backfitting method and the Bayesian framework. To achieve optimal accuracy-complexity tradeoff, we extend this model with a novel variant of projection pursuit regression. Our primary result -- projection pursuit Gaussian Process Regression -- shows orders of magnitude speedup while preserving high accuracy. The natural second and third steps include non-Gaussian observations and higher dimensional equispaced grid methods. We introduce novel techniques to address both of these necessary directions. We thoroughly illustrate the power of these three advances on several datasets, achieving close performance to the naive Full GP at orders of magnitude less cost.