Goto

Collaborating Authors

 Bayesian Learning


Bayesian and L1 Approaches to Sparse Unsupervised Learning

arXiv.org Artificial Intelligence

The use of L1 regularisation for sparse learning has generated immense research interest, with successful application in such diverse areas as signal acquisition, image coding, genomics and collaborative filtering. While existing work highlights the many advantages of L1 methods, in this paper we find that L1 regularisation often dramatically underperforms in terms of predictive performance when compared with other methods for inferring sparsity. We focus on unsupervised latent variable models, and develop L1 minimising factor models, Bayesian variants of "L1", and Bayesian models with a stronger L0-like sparsity induced through spike-and-slab distributions. These spike-and-slab Bayesian factor models encourage sparsity while accounting for uncertainty in a principled manner and avoiding unnecessary shrinkage of non-zero values. We demonstrate on a number of data sets that in practice spike-and-slab Bayesian methods outperform L1 minimisation, even on a computational budget. We thus highlight the need to re-assess the wide use of L1 methods in sparsity-reliant applications, particularly when we care about generalising to previously unseen data, and provide an alternative that, over many varying conditions, provides improved generalisation performance.


On Finding Optimal Polytrees

arXiv.org Artificial Intelligence

Inferring probabilistic networks from data is a notoriously difficult task. Under various goodness-of-fit measures, finding an optimal network is NP-hard, even if restricted to polytrees of bounded in-degree. Polynomial-time algorithms are known only for rare special cases, perhaps most notably for branchings, that is, polytrees in which the in-degree of every node is at most one. Here, we study the complexity of finding an optimal polytree that can be turned into a branching by deleting some number of arcs or nodes, treated as a parameter. We show that the problem can be solved via a matroid intersection formulation in polynomial time if the number of deleted arcs is bounded by a constant. The order of the polynomial time bound depends on this constant, hence the algorithm does not establish fixed-parameter tractability when parameterized by the number of deleted arcs. We show that a restricted version of the problem allows fixed-parameter tractability and hence scales well with the parameter. We contrast this positive result by showing that if we parameterize by the number of deleted nodes, a somewhat more powerful parameter, the problem is not fixed-parameter tractable, subject to a complexity-theoretic assumption.


Multidimensional Membership Mixture Models

arXiv.org Machine Learning

We present the multidimensional membership mixture (M3) models where every dimension of the membership represents an independent mixture model and each data point is generated from the selected mixture components jointly. This is helpful when the data has a certain shared structure. For example, three unique means and three unique variances can effectively form a Gaussian mixture model with nine components, while requiring only six parameters to fully describe it. In this paper, we present three instantiations of M3 models (together with the learning and inference algorithms): infinite, finite, and hybrid, depending on whether the number of mixtures is fixed or not. They are built upon Dirichlet process mixture models, latent Dirichlet allocation, and a combination respectively. We then consider two applications: topic modeling and learning 3D object arrangements. Our experiments show that our M3 models achieve better performance using fewer topics than many classic topic models. We also observe that topics from the different dimensions of M3 models are meaningful and orthogonal to each other.


Riffled Independence for Efficient Inference with Partial Rankings

Journal of Artificial Intelligence Research

Distributions over rankings are used to model data in a multitude of real world settings such as preference analysis and political elections. Modeling such distributions presents several computational challenges, however, due to the factorial size of the set of rankings over an item set. Some of these challenges are quite familiar to the artificial intelligence community, such as how to compactly represent a distribution over a combinatorially large space, and how to efficiently perform probabilistic inference with these representations. With respect to ranking, however, there is the additional challenge of what we refer to as human task complexity -- users are rarely willing to provide a full ranking over a long list of candidates, instead often preferring to provide partial ranking information. Simultaneously addressing all of these challenges -- i.e., designing a compactly representable model which is amenable to efficient inference and can be learned using partial ranking data -- is a difficult task, but is necessary if we would like to scale to problems with nontrivial size. In this paper, we show that the recently proposed riffled independence assumptions cleanly and efficiently address each of the above challenges. In particular, we establish a tight mathematical connection between the concepts of riffled independence and of partial rankings. This correspondence not only allows us to then develop efficient and exact algorithms for performing inference tasks using riffled independence based representations with partial rankings, but somewhat surprisingly, also shows that efficient inference is not possible for riffle independent models (in a certain sense) with observations which do not take the form of partial rankings. Finally, using our inference algorithm, we introduce the first method for learning riffled independence based models from partially ranked data.


A note on the lack of symmetry in the graphical lasso

arXiv.org Machine Learning

The graphical lasso (glasso) is a widely-used fast algorithm for estimating sparse inverse covariance matrices. The glasso solves an L1 penalized maximum likelihood problem and is available as an R library on CRAN. The output from the glasso, a regularized covariance matrix estimate a sparse inverse covariance matrix estimate, not only identify a graphical model but can also serve as intermediate inputs into multivariate procedures such as PCA, LDA, MANOVA, and others. The glasso indeed produces a covariance matrix estimate which solves the L1 penalized optimization problem in a dual sense; however, the method for producing the inverse covariance matrix estimator after this optimization is inexact and may produce asymmetric estimates. This problem is exacerbated when the amount of L1 regularization that is applied is small, which in turn is more likely to occur if the true underlying inverse covariance matrix is not sparse. The lack of symmetry can potentially have consequences. First, it implies that the covariance and inverse covariance estimates are not numerical inverses of one another, and second, asymmetry can possibly lead to negative or complex eigenvalues,rendering many multivariate procedures which may depend on the inverse covariance estimator unusable. We demonstrate this problem, explain its causes, and propose possible remedies.


Nonlinear spectral unmixing of hyperspectral images using Gaussian processes

arXiv.org Machine Learning

This paper presents an unsupervised algorithm for nonlinear unmixing of hyperspectral images. The proposed model assumes that the pixel reflectances result from a nonlinear function of the abundance vectors associated with the pure spectral components. We assume that the spectral signatures of the pure components and the nonlinear function are unknown. The first step of the proposed method consists of the Bayesian estimation of the abundance vectors for all the image pixels and the nonlinear function relating the abundance vectors to the observations. The endmembers are subsequently estimated using Gaussian process regression. The performance of the unmixing strategy is evaluated with simulations conducted on synthetic and real data.


Considering State in Plan Recognition with Lexicalized Grammars

AAAI Conferences

This paper documents extending the ELEXIR (Engine for LEXicalized Intent Recognition) system (Geib 2009; Geib 2011) with a world model. This is a significant increase in the expressiveness of the plan recognition system and allows a number of additions to the algorithm, most significantly conditioning probabilities for recognized plans on the state of the world during execution. Since, ELEXIR falls in the family of gramatical methods for plan recognition in viewing the problem of plan recognition as that of parsing, this paper will also briefly discuss how this extension relates to state of the art proposals in the natural language community regarding probabilistic parsing.


Statistical Anomaly Detection for Train Fleets

AAAI Conferences

We have developed a method for statistical anomaly detection which has been deployed in a tool for condition monitoring of train fleets. The tool is currently used by several railway operators over the world to inspect and visualize the occurrence of event messages generated on the trains. The anomaly detection component helps the operators to quickly find significant deviations from normal behavior and to detect early indications for possible problems. The savings in maintenance costs comes mainly from avoiding costly breakdowns, and have been estimated to several million Euros per year for the tool. In the long run, it is expected that maintenance costs can be reduced with between 5 and 10 % by using the tool.


Bayesian Unification of Sound Source Localization and Separation with Permutation Resolution

AAAI Conferences

Sound source localization and separation with permutation resolution are essential for achieving a computational auditory scene analysis system that can extract useful information from a mixture of various sounds. Because existing methods cope separately with these problems despite their mutual dependence, the overall result with these approaches can be degraded by any failure in one of these components. This paper presents a unified Bayesian framework to solve these problems simultaneously where localization and separation are regarded as a clustering problem. Experimental results confirm that our method outperforms state-of-the-art methods in terms of the separation quality with various setups including practical reverberant environments.


Symbolic Variable Elimination for Discrete and Continuous Graphical Models

AAAI Conferences

Probabilistic reasoning in the real-world often requires inference incontinuous variable graphical models, yet there are few methods for exact, closed-form inference when joint distributions are non-Gaussian. To address this inferential deficit, we introduce SVE -- a symbolic extension of the well-known variable elimination algorithm to perform exact inference in an expressive class of mixed discrete and continuous variable graphical models whose conditional probability functions can be well-approximated as piecewise combinations of polynomials with bounded support. Using this representation, we show that we can compute all of the SVE operations exactly and in closed-form, which crucially includes definite integration w.r.t. multivariate piecewise polynomial functions. To aid in the efficient computation and compact representation of this solution, we use an extended algebraic decision diagram (XADD) data structure that supports all SVE operations. We provide illustrative results for SVE on probabilistic inference queries inspired by robotics localization and tracking applications that mix various continuous distributions; this represents the first time a general closed-form exact solution has been proposed for this expressive class of discrete/continuous graphical models.