Goto

Collaborating Authors

 Uncertainty


Learning Graphical Models with Mercer Kernels

Neural Information Processing Systems

We present a class of algorithms for learning the structure of graphical models from data. The algorithms are based on a measure known as the kernel generalized variance (KGV), which essentially allows us to treat all variables on an equal footing as Gaussians in a feature space obtained from Mercer kernels. Thus we are able to learn hybrid graphs involving discrete and continuous variables of arbitrary type. We explore the computational properties of our approach, showing how to use the kernel trick to compute the relevant statistics in linear time. We illustrate our framework with experiments involving discrete and continuous data.


Incremental Gaussian Processes

Neural Information Processing Systems

In this paper, we consider Tipping's relevance vector machine (RVM) [1] and formalize an incremental training strategy as a variant of the expectation-maximization (EM) algorithm that we call Subspace EM (SSEM). Working with a subset of active basis functions, the sparsity of the RVM solution will ensure that the number of basis functions and thereby the computational complexity is kept low. We also introduce a mean field approach to the intractable classification model that is expected to give a very good approximation to exact Bayesian inference and contains the Laplace approximation as a special case.


Discriminative Densities from Maximum Contrast Estimation

Neural Information Processing Systems

We propose a framework for classifier design based on discriminative densities for representation of the differences of the class-conditional distributions in a way that is optimal for classification. The densities are selected from a parametrized set by constrained maximization of some objective function which measures the average (bounded) difference, i.e. the contrast between discriminative densities. We show that maximization of the contrast is equivalent to minimization of an approximation of the Bayes risk.


Transductive and Inductive Methods for Approximate Gaussian Process Regression

Neural Information Processing Systems

Gaussian process regression allows a simple analytical treatment of exact Bayesian inference and has been found to provide good performance, yet scales badly with the number of training data. In this paper we compare several approaches towards scaling Gaussian processes regression to large data sets: the subset of representers method, the reduced rank approximation, online Gaussian processes, and the Bayesian committee machine. Furthermore we provide theoretical insight into some of our experimental results. We found that subset of representers methods can give good and particularly fast predictions for data sets with high and medium noise levels. On complex low noise data sets, the Bayesian committee machine achieves significantly better accuracy, yet at a higher computational cost.


Learning with Multiple Labels

Neural Information Processing Systems

In this paper, we study a special kind of learning problem in which each training instance is given a set of (or distribution over) candidate class labels and only one of the candidate labels is the correct one. Such a problem can occur, e.g., in an information retrieval setting where a set of words is associated with an image, or if classes labels are organized hierarchically. We propose a novel discriminative approach for handling the ambiguity of class labels in the training examples. The experiments with the proposed approach over five different UCI datasets show that our approach is able to find the correct label among the set of candidate labels and actually achieve performance close to the case when each training instance is given a single correct label. In contrast, naIve methods degrade rapidly as more ambiguity is introduced into the labels. 1 Introduction Supervised and unsupervised learning problems have been extensively studied in the machine learning literature. In supervised classification each training instance is associated with a single class label, while in unsupervised classification (i.e.


Handling Missing Data with Variational Bayesian Learning of ICA

Neural Information Processing Systems

Missing data is common in real-world datasets and is a problem for many estimation techniques. We have developed a variational Bayesian method to perform Independent Component Analysis (ICA) on high-dimensional data containing missing entries. Missing data are handled naturally in the Bayesian framework by integrating the generative density model. Modeling the distributions of the independent sources with mixture of Gaussians allows sources to be estimated with different kurtosis and skewness. The variational Bayesian method automatically determines the dimensionality of the data and yields an accurate density model for the observed data without overfitting problems. This allows direct probability estimation of missing values in the high dimensional space and avoids dimension reduction preprocessing which is not feasible with missing data.


Exact MAP Estimates by (Hyper)tree Agreement

Neural Information Processing Systems

We describe a method for computing provably exact maximum a posteriori (MAP) estimates for a subclass of problems on graphs with cycles. The basic idea is to represent the original problem on the graph with cycles as a convex combination of tree-structured problems. A convexity argument then guarantees that the optimal value of the original problem (i.e., the log probability of the MAP assignment) is upper bounded by the combined optimal values of the tree problems. We prove that this upper bound is met with equality if and only if the tree problems share an optimal configuration in common. An important implication is that any such shared configuration must also be the MAP configuration for the original problem. Next we develop a tree-reweighted max-product algorithm for attempting to find convex combinations of tree-structured problems that share a common optimum. We give necessary and sufficient conditions for a fixed point to yield the exact MAP estimate. An attractive feature of our analysis is that it generalizes naturally to convex combinations of hypertree-structured distributions.


Using Tarjan's Red Rule for Fast Dependency Tree Construction

Neural Information Processing Systems

We focus on the problem of efficient learning of dependency trees. It is well-known that given the pairwise mutual information coefficients, a minimum-weight spanning tree algorithm solves this problem exactly and in polynomial time. However, for large data-sets it is the construction of the correlation matrix that dominates the running time. We have developed a new spanning-tree algorithm which is capable of exploiting partial knowledge about edge weights. The partial knowledge we maintain is a probabilistic confidence interval on the coefficients, which we derive by examining just a small sample of the data. The algorithm is able to flag the need to shrink an interval, which translates to inspection of more data for the particular attribute pair. Experimental results show running time that is near-constant in the number of records, without significant loss in accuracy of the generated trees. Interestingly, our spanning-tree algorithm is based solely on Tarjan's red-edge rule, which is generally considered a guaranteed recipe for bad performance.


Nash Propagation for Loopy Graphical Games

Neural Information Processing Systems

We introduce NashProp, an iterative and local message-passing algorithm for computing Nash equilibria in multi-player games represented by arbitrary undirected graphs. We provide a formal analysis and experimental evidence demonstrating that NashProp performs well on large graphical games with many loops, often converging in just a dozen iterations on graphs with hundreds of nodes. NashProp generalizes the tree algorithm of (Kearns et al. 2001), and can be viewed as similar in spirit to belief propagation in probabilistic inference, and thus complements the recent work of (Vickrey and Koller 2002), who explored a junction tree approach. Thus, as for probabilistic inference, we have at least two promising general-purpose approaches to equilibria computation in graphs.


VIBES: A Variational Inference Engine for Bayesian Networks

Neural Information Processing Systems

In recent years variational methods have become a popular tool for approximate inference and learning in a wide variety of probabilistic models. For each new application, however, it is currently necessary first to derive the variational update equations, and then to implement them in application-specific code. Each of these steps is both time consuming and error prone. In this paper we describe a general purpose inference engine called VIBES ('Variational Inference for Bayesian Networks') which allows a wide variety of probabilistic models to be implemented and solved variationally without recourse to coding. New models are specified either through a simple script or via a graphical interface analogous to a drawing package. VIBES then automatically generates and solves the variational equations. We illustrate the power and flexibility of VIBES using examples from Bayesian mixture modelling.