Goto

Collaborating Authors

 Asia


Density Estimation under Independent Similarly Distributed Sampling Assumptions

Neural Information Processing Systems

A method is proposed for semiparametric estimation where parametric and nonparametric criteria are exploited in density estimation and unsupervised learning. This is accomplished by making sampling assumptions on a dataset that smoothly interpolate between the extreme of independently distributed (or id) sample data (as in nonparametric kernel density estimators) to the extreme of independent identically distributed (or iid) sample data. This article makes independent similarly distributed (or isd) sampling assumptions and interpolates between these two using a scalar parameter. The parameter controls a Bhattacharyya affinity penalty between pairs of distributions on samples. Surprisingly, the isd method maintains certain consistency and unimodality properties akin to maximum likelihood estimation. The proposed isd scheme is an alternative for handling nonstationarity in data without making drastic hidden variable assumptions which often make estimation difficult and laden with local optima. Experiments in density estimation on a variety of datasets confirm the value of isd over iid estimation, id estimation and mixture modeling.


Bayesian Policy Learning with Trans-Dimensional MCMC

Neural Information Processing Systems

A recently proposed formulation of the stochastic planning and control problem as one of parameter estimation for suitable artificial statistical models has led to the adoption of inference algorithms for this notoriously hard problem. At the algorithmic level, the focus has been on developing Expectation-Maximization (EM) algorithms. In this paper, we begin by making the crucial observation that the stochastic control problem can be reinterpreted as one of trans-dimensional inference. With this new interpretation, we are able to propose a novel reversible jump Markov chain Monte Carlo (MCMC) algorithm that is more efficient than its EM counterparts. Moreover, it enables us to implement full Bayesian policy search, without the need for gradients and with one single Markov chain. The new approach involves sampling directly from a distribution that is proportional to the reward and, consequently, performs better than classic simulations methods in situations where the reward is a rare event.


Convex Relaxations of Latent Variable Training

Neural Information Processing Systems

We investigate a new, convex relaxation of an expectation-maximization (EM) variant that approximates a standard objective while eliminating local minima. First, a cautionary result is presented, showing that any convex relaxation of EM over hidden variables must give trivial results if any dependence on the missing values is retained. Although this appears to be a strong negative outcome, we then demonstrate how the problem can be bypassed by using equivalence relations instead of value assignments over hidden variables. In particular, we develop new algorithms for estimating exponential conditional models that only require equivalence relation information over the variable values. This reformulation leads to an exact expression for EM variants in a wide range of problems. We then develop a semidefinite relaxation that yields global training by eliminating local minima.


A Kernel Statistical Test of Independence

Neural Information Processing Systems

Although kernel measures of independence have been widely applied in machine learning (notably in kernel ICA), there is as yet no method to determine whether they have detected statistically significant dependence. We provide a novel test of the independence hypothesis for one particular kernel independence measure, the Hilbert-Schmidt independence criterion (HSIC).


Expectation Maximization and Posterior Constraints

Neural Information Processing Systems

The expectation maximization (EM) algorithm is a widely used maximum likelihood estimation procedure for statistical models when the values of some of the variables in the model are not observed. Very often, however, our aim is primarily to find a model that assigns values to the latent variables that have intended meaning for our data and maximizing expected likelihood only sometimes accomplishes this. Unfortunately, it is typically difficult to add even simple a-priori information about latent variables in graphical models without making the models overly complex or intractable. In this paper, we present an efficient, principled way to inject rich constraints on the posteriors of latent variables into the EM algorithm. Our method can be used to learn tractable graphical models that satisfy additional, otherwise intractable constraints. Focusing on clustering and the alignment problem for statistical machine translation, we show that simple, intuitive posterior constraints can greatly improve the performance over standard baselines and be competitive with more complex, intractable models.


Iterative Non-linear Dimensionality Reduction with Manifold Sculpting

Neural Information Processing Systems

Many algorithms have been recently developed for reducing dimensionality by projecting data onto an intrinsic nonlinear manifold. Unfortunately, existing algorithms often lose significant precision in this transformation. Manifold Sculpting is a new algorithm that iteratively reduces dimensionality by simulating surface tension in local neighborhoods. We present several experiments that show Manifold Sculpting yields more accurate results than existing algorithms with both generated and natural data-sets. Manifold Sculpting is also able to benefit from both prior dimensionality reduction efforts.


Kernel Measures of Conditional Dependence

Neural Information Processing Systems

We propose a new measure of conditional dependence of random variables, based on normalized cross-covariance operators on reproducing kernel Hilbert spaces. Unlike previous kernel dependence measures, the proposed criterion does not depend on the choice of kernel in the limit of infinite data, for a wide class of kernels. At the same time, it has a straightforward empirical estimate with good convergence behaviour. We discuss the theoretical properties of the measure, and demonstrate its application in experiments.


Discovering Weakly-Interacting Factors in a Complex Stochastic Process

Neural Information Processing Systems

Dynamic Bayesian networks are structured representations of stochastic processes. Despite their structure, exact inference in DBNs is generally intractable. One approach to approximate inference involves grouping the variables in the process into smaller factors and keeping independent beliefs over these factors. In this paper we present several techniques for decomposing a dynamic Bayesian network automatically to enable factored inference. We examine a number of features of a DBN that capture different types of dependencies that will cause error in factored inference. An empirical comparison shows that the most useful of these is a heuristic that estimates the mutual information introduced between factors by one step of belief propagation.


Anytime Induction of Cost-sensitive Trees

Neural Information Processing Systems

Machine learning techniques are increasingly being used to produce a wide-range of classifiers for complex real-world applications that involve nonuniform testing costs and misclassification costs. As the complexity of these applications grows, the management of resources during the learning and classification processes becomes a challenging task. In this work we introduce ACT (Anytime Cost-sensitive Trees), a novel framework for operating in such environments. ACT is an anytime algorithm that allows trading computation time for lower classification costs. It builds a tree top-down and exploits additional time resources to obtain better estimations for the utility of the different candidate splits.


Measuring Neural Synchrony by Message Passing

Neural Information Processing Systems

A novel approach to measure the interdependence of two time series is proposed, referred to as "stochastic event synchrony" (SES); it quantifies the alignment of two point processes by means of the following parameters: time delay, variance of the timing jitter, fraction of "spurious" events, and average similarity of events. SES may be applied to generic one-dimensional and multidimensional point processes, however, the paper mainly focusses on point processes in time-frequency domain. The average event similarity is in that case described by two parameters: the average frequency offset between events in the time-frequency plane, and the variance of the frequency offset ("frequency jitter"); SES then consists of five parameters in total. Those parameters quantify the synchrony of oscillatory events, and hence, they provide an alternative to existing synchrony measures that quantify amplitude or phase synchrony. The pairwise alignment of point processes is cast as a statistical inference problem, which is solved by applying the maxproduct algorithm on a graphical model. The SES parameters are determined from the resulting pairwise alignment by maximum a posteriori (MAP) estimation. The proposed interdependence measure is applied to the problem of detecting anomalies in EEG synchrony of Mild Cognitive Impairment (MCI) patients; the results indicate that SES significantly improves the sensitivity of EEG in detecting MCI.