Goto

Collaborating Authors

 Directed Networks


Coupling Nonparametric Mixtures via Latent Dirichlet Processes

Neural Information Processing Systems

Mixture distributions are often used to model complex data. In this paper, we develop a new method that jointly estimates mixture models over multiple data sets by exploiting the statistical dependencies between them. Specifically, we introduce a set of latent Dirichlet processes as sources of component models (atoms), and for each data set, we construct a nonparametric mixture model by combining sub-sampled versions of the latent DPs. Each mixture model may acquire atoms from different latent DPs, while each atom may be shared by multiple mixtures. This multi-to-multi association distinguishes the proposed method from prior constructions that rely on tree or chain structures, allowing mixture models to be coupled more flexibly. In addition, we derive a sampling algorithm that jointly infers the model parameters and present experiments on both document analysis and image modeling.


Learning as MAP Inference in Discrete Graphical Models

Neural Information Processing Systems

We present a new formulation for attacking binary classification problems. Instead of relying on convex losses and regularisers such as in SVMs, logistic regression and boosting, or instead non-convex but continuous formulations such as those encountered in neural networks and deep belief networks, our framework entails a non-convex but \emph{discrete} formulation, where estimation amounts to finding a MAP configuration in a graphical model whose potential functions are low-dimensional discrete surrogates for the misclassification loss. We argue that such a discrete formulation can naturally account for a number of issues that are typically encountered in either the convex or the continuous non-convex paradigms, or both. By reducing the learning problem to a MAP inference problem, we can immediately translate the guarantees available for many inference settings to the learning problem itself. We empirically demonstrate in a number of experiments that this approach is promising in dealing with issues such as severe label noise, while still having global optimality guarantees. Due to the discrete nature of the formulation, it also allows for \emph{direct} regularisation through cardinality-based penalties, such as the $\ell_0$ pseudo-norm, thus providing the ability to perform feature selection and trade-off interpretability and predictability in a principled manner. We also outline a number of open problems arising from the formulation.


Latent Coincidence Analysis: A Hidden Variable Model for Distance Metric Learning

Neural Information Processing Systems

We describe a latent variable model for supervised dimensionality reduction and distance metric learning. The model discovers linear projections of high dimensional data that shrink the distance between similarly labeled inputs and expand the distance between differently labeled ones. The modelโ€™s continuous latent variables locate pairs of examples in a latent space of lower dimensionality. The model differs significantly from classical factor analysis in that the posterior distribution over these latent variables is not always multivariate Gaussian. Nevertheless we show that inference is completely tractable and derive an Expectation-Maximization (EM) algorithm for parameter estimation. We also compare the model to other approaches in distance metric learning. The modelโ€™s main advantage is its simplicity: at each iteration of the EM algorithm, the distance metric is re-estimated by solving an unconstrained least-squares problem. Experiments show that these simple updates are highly effective.


Volume Regularization for Binary Classification

Neural Information Processing Systems

We introduce a large-volume box classification for binary prediction, which maintains a subset of weight vectors, and specifically axis-aligned boxes. Our learning algorithm seeks for a box of large volume that contains ``simple'' weight vectors which most of are accurate on the training set. Two versions of the learning process are cast as convex optimization problems, and it is shown how to solve them efficiently. The formulation yields a natural PAC-Bayesian performance bound and it is shown to minimize a quantity directly aligned with it. The algorithm outperforms SVM and the recently proposed AROW algorithm on a majority of $30$ NLP datasets and binarized USPS optical character recognition datasets.


Random function priors for exchangeable arrays with applications to graphs and relational data

Neural Information Processing Systems

A fundamental problem in the analysis of structured relational data like graphs, networks, databases, and matrices is to extract a summary of the common structure underlyingrelations between individual entities. Relational data are typically encoded in the form of arrays; invariance to the ordering of rows and columns corresponds to exchangeable arrays. Results in probability theory due to Aldous, Hoover and Kallenberg show that exchangeable arrays can be represented in terms of a random measurable function which constitutes the natural model parameter in a Bayesian model. We obtain a flexible yet simple Bayesian nonparametric model by placing a Gaussian process prior on the parameter function. Efficient inference utilises elliptical slice sampling combined with a random sparse approximation to the Gaussian process. We demonstrate applications of the model to network data and clarify its relation to models in the literature, several of which emerge as special cases.


Phoneme Classification using Constrained Variational Gaussian Process Dynamical System

Neural Information Processing Systems

This paper describes a new acoustic model based on variational Gaussian process dynamical system (VGPDS) for phoneme classification. The proposed model overcomes the limitations of the classical HMM in modeling the real speech data, by adopting a nonlinear and nonparametric model. In our model, the GP prior on the dynamics function enables representing the complex dynamic structure of speech, while the GP prior on the emission function successfully models the global dependency over the observations. Additionally, we introduce variance constraint to the original VGPDS for mitigating sparse approximation error of the kernel matrix. The effectiveness of the proposed model is demonstrated with extensive experimental results including parameter estimation, classification performance on the synthetic and benchmark datasets.


Bayesian Probabilistic Co-Subspace Addition

Neural Information Processing Systems

For modeling data matrices, this paper introduces Probabilistic Co-Subspace Addition (PCSA) model by simultaneously capturing the dependent structures among both rows and columns. Briefly, PCSA assumes that each entry of a matrix is generated by the additive combination of the linear mappings of two features, which distribute in the row-wise and column-wise latent subspaces. Consequently, it captures the dependencies among entries intricately, and is able to model the non-Gaussian and heteroscedastic density. Variational inference is proposed on PCSA for approximate Bayesian learning, where the updating for posteriors is formulated into the problem of solving Sylvester equations. Furthermore, PCSA is extended to tackling and filling missing values, to adapting its sparseness, and to modelling tensor data. In comparison with several state-of-art approaches, experiments demonstrate the effectiveness and efficiency of Bayesian (sparse) PCSA on modeling matrix (tensor) data and filling missing values.


Cardinality Restricted Boltzmann Machines

Neural Information Processing Systems

The Restricted Boltzmann Machine (RBM) is a popular density model that is also good for extracting features. A main source of tractability in RBM models is the model's assumption that given an input, hidden units activate independently from one another. Sparsity and competition in the hidden representation is believed to be beneficial, and while an RBM with competition among its hidden units would acquire some of the attractive properties of sparse coding, such constraints are not added due to the widespread belief that the resulting model would become intractable. In this work, we show how a dynamic programming algorithm developed in 1981 can be used to implement exact sparsity in the RBM's hidden units. We then expand on this and show how to pass derivatives through a layer of exact sparsity, which makes it possible to fine-tune a deep belief network (DBN) consisting of RBMs with sparse hidden layers. We show that sparsity in the RBM's hidden layer improves the performance of both the pre-trained representations and of the fine-tuned model.


Small-Variance Asymptotics for Exponential Family Dirichlet Process Mixture Models

Neural Information Processing Systems

Links between probabilistic and non-probabilistic learning algorithms can arise by performing small-variance asymptotics, i.e., letting the variance of particular distributions in a graphical model go to zero. For instance, in the context of clustering, such an approach yields precise connections between the k-means and EM algorithms. In this paper, we explore small-variance asymptotics for exponential family Dirichlet process (DP) and hierarchical Dirichlet process (HDP) mixture models. Utilizing connections between exponential family distributions and Bregman divergences, we derive novel clustering algorithms from the asymptotic limit of the DP and HDP mixtures that feature the scalability of existing hard clustering methods as well as the flexibility of Bayesian nonparametric models. We focus on special cases of our analysis for discrete-data problems, including topic modeling, and we demonstrate the utility of our results by applying variants of our algorithms to problems arising in vision and document analysis.


Probabilistic Event Cascades for Alzheimer's disease

Neural Information Processing Systems

Accurate and detailed models of the progression of neurodegenerative diseases such as Alzheimer's (AD) are crucially important for reliable early diagnosis and the determination and deployment of effective treatments. In this paper, we introduce the ALPACA (Alzheimer's disease Probabilistic Cascades) model, a generative model linking latent Alzheimer's progression dynamics to observable biomarker data. In contrast with previous works which model disease progression as a fixed ordering of events, we explicitly model the variability over such orderings among patients which is more realistic, particularly for highly detailed disease progression models. We describe efficient learning algorithms for ALPACA and discuss promising experimental results on a real cohort of Alzheimer's patients from the Alzheimer's Disease Neuroimaging Initiative.