Goto

Collaborating Authors

 Learning Graphical Models


Modelling motion primitives and their timing in biologically executed movements

Neural Information Processing Systems

Biological movement is built up of sub-blocks or motion primitives. Such primitives provide a compact representation of movement which is also desirable in robotic control applications. We analyse handwriting data to gain a better understanding of use of primitives and their timings in biological movements. Inference of the shape and the timing of primitives can be done using a factorial HMM based model, allowing the handwriting to be represented in primitive timing space. This representation provides a distribution of spikes corresponding to the primitive activations, which can also be modelled using HMM architectures. We show how the coupling of the low level primitive model, and the higher level timing model during inference can produce good reconstructions of handwriting, with shared primitives for all characters modelled. This coupled model also captures the variance profile of the dataset which is accounted for by spike timing jitter. The timing code provides a compact representation of the movement while generating a movement without an explicit timing model produces a scribbling style of output.


Collapsed Variational Inference for HDP

Neural Information Processing Systems

A wide variety of Dirichlet-multinomial'topic' models have found interesting applications inrecent years. While Gibbs sampling remains an important method of inference in such models, variational techniques have certain advantages such as easy assessment of convergence, easy optimization without the need to maintain detailed balance, a bound on the marginal likelihood, and sidestepping of issues with topic-identifiability. The most accurate variational technique thus far, namely collapsed variational latent Dirichlet allocation, did not deal with model selection nor did it include inference for hyperparameters. We address both issues by generalizing thetechnique, obtaining the first variational algorithm to deal with the hierarchical Dirichlet process and to deal with hyperparameters of Dirichlet variables.


Modeling homophily and stochastic equivalence in symmetric relational data

Neural Information Processing Systems

This article discusses a latent variable model for inference and prediction of symmetric relational data. The model, based on the idea of the eigenvalue decomposition, represents the relationship between two nodes as the weighted inner-product of node-specific vectors of latent characteristics. This ``eigenmodel'' generalizes other popular latent variable models, such as latent class and distance models: It is shown mathematically that any latent class or distance model has a representation as an eigenmodel, but not vice-versa. The practical implications of this are examined in the context of three real datasets, for which the eigenmodel has as good or better out-of-sample predictive performance than the other two models.


Discovering Weakly-Interacting Factors in a Complex Stochastic Process

Neural Information Processing Systems

Dynamic Bayesian networks are structured representations of stochastic processes. Despitetheir structure, exact inference in DBNs is generally intractable. One approach to approximate inference involves grouping the variables in the process into smaller factors and keeping independent beliefs over these factors. In this paper we present several techniques for decomposing a dynamic Bayesian network automatically to enable factored inference. We examine a number of features ofa DBN that capture different types of dependencies that will cause error in factored inference. An empirical comparison shows that the most useful of these is a heuristic that estimates the mutual information introduced between factors by one step of belief propagation.


What makes some POMDP problems easy to approximate?

Neural Information Processing Systems

Point-based algorithms have been surprisingly successful in computing approximately optimalsolutions for partially observable Markov decision processes (POMDPs) in high dimensional belief spaces. In this work, we seek to understand the belief-space properties that allow some POMDP problems to be approximated efficiently and thus help to explain the point-based algorithms' success often observed inthe experiments. We show that an approximately optimal POMDP solution can be computed in time polynomial in the covering number of a reachable belief space, which is the subset of the belief space reachable from a given belief point. We also show that under the weaker condition of having a small covering number for an optimal reachable space, which is the subset of the belief space reachable under an optimal policy, computing an approximately optimal solution is NPhard. However, given a suitable set of points that "cover" an optimal reachable spacewell, an approximate solution can be computed in polynomial time. The covering number highlights several interesting properties that reduce the complexity ofPOMDP planning in practice, e.g., fully observed state variables, beliefs with sparse support, smooth beliefs, and circulant state-transition matrices.


Efficient Principled Learning of Thin Junction Trees

Neural Information Processing Systems

We present the first truly polynomial algorithm for learning the structure of bounded-treewidth junction trees -- an attractive subclass of probabilistic graphical models that permits both the compact representation of probability distributions and efficient exact inference. For a constant treewidth, our algorithm has polynomial time and sample complexity, and provides strong theoretical guarantees in terms of $KL$ divergence from the true distribution. We also present a lazy extension of our approach that leads to very significant speed ups in practice, and demonstrate the viability of our method empirically, on several real world datasets. One of our key new theoretical insights is a method for bounding the conditional mutual information of arbitrarily large sets of random variables with only a polynomial number of mutual information computations on fixed-size subsets of variables, when the underlying distribution can be approximated by a bounded treewidth junction tree.


Convex Clustering with Exemplar-Based Models

Neural Information Processing Systems

Clustering is often formulated as the maximum likelihood estimation of a mixture model that explains the data. The EM algorithm widely used to solve the resulting optimization problem is inherently a gradient-descent method and is sensitive to initialization. The resulting solution is a local optimum in the neighborhood of the initial guess. This sensitivity to initialization presents a significant challenge in clustering large data sets into many clusters. In this paper, we present a different approachto approximate mixture fitting for clustering. We introduce an exemplar-based likelihood function that approximates the exact likelihood. This formulation leads to a convex minimization problem and an efficient algorithm with guaranteed convergence to the globally optimal solution. The resulting clustering canbe thought of as a probabilistic mapping of the data points to the set of exemplars that minimizes the average distance and the information-theoretic cost of mapping.


HM-BiTAM: Bilingual Topic Exploration, Word Alignment, and Translation

Neural Information Processing Systems

We present a novel paradigm for statistical machine translation (SMT), based on joint modeling of word alignment and the topical aspects underlying bilingual document pairs via a hidden Markov Bilingual Topic AdMixture (HM-BiTAM). In this new paradigm, parallel sentence-pairs from a parallel document-pair are coupled via a certain semantic-flow, to ensure coherence of topical context in the alignment of matching words between languages, during likelihood-based training of topic-dependent translational lexicons, as well as topic representations in each language. The resulting trained HM-BiTAM can not only display topic patterns like other methods such as LDA, but now for bilingual corpora; it also offers a principled way of inferring optimal translation in a context-dependent way. Our method integrates the conventional IBM Models based on HMM --- a key component for most of the state-of-the-art SMT systems, with the recently proposed BiTAM model, and we report an extensive empirical analysis (in many way complementary to the description-oriented of our method in three aspects: word alignment, bilingual topic representation, and translation.


A New View of Automatic Relevance Determination

Neural Information Processing Systems

Automatic relevance determination (ARD), and the closely-related sparse Bayesian learning (SBL) framework, are effective tools for pruning large numbers of irrelevant features. However, popular update rules used for this process are either prohibitively slow in practice and/or heuristic in nature without proven convergence properties. This paper furnishes an alternative means of optimizing a general ARD cost function using an auxiliary function that can naturally be solved using a series of re-weighted L1 problems. The result is an efficient algorithm that can be implemented using standard convex programming toolboxes and is guaranteed to converge to a stationary point unlike existing methods. The analysis also leads to additional insights into the behavior of previous ARD updates as well as the ARD cost function. For example, the standard fixed-point updates of MacKay (1992) are shown to be iteratively solving a particular min-max problem, although they are not guaranteed to lead to a stationary point. The analysis also reveals that ARD is exactly equivalent to performing MAP estimation using a particular feature- and noise-dependent \textit{non-factorial} weight prior with several desirable properties over conventional priors with respect to feature selection. In particular, it provides a tighter approximation to the L0 quasi-norm sparsity measure than the L1 norm. Overall these results suggests alternative cost functions and update procedures for selecting features and promoting sparse solutions.


Exponential Family Predictive Representations of State

Neural Information Processing Systems

In order to represent state in controlled, partially observable, stochastic dynamical systems, some sort of sufficient statistic for history is necessary. Predictive representations ofstate (PSRs) capture state as statistics of the future. We introduce a new model of such systems called the "Exponential family PSR," which defines as state the time-varying parameters of an exponential family distribution which models n sequential observations in the future. This choice of state representation explicitly connects PSRs to state-of-the-art probabilistic modeling, which allows us to take advantage of current efforts in high-dimensional density estimation, and in particular, graphical models and maximum entropy models. We present a parameter learningalgorithm based on maximum likelihood, and we show how a variety of current approximate inference methods apply. We evaluate the quality ofour model with reinforcement learning by directly evaluating the control performance of the model.