Goto

Collaborating Authors

 Country


Simplifying Mixture Models through Function Approximation

Neural Information Processing Systems

Finite mixture model is a powerful tool in many statistical learning problems. In this paper, we propose a general, structure-preserving approach to reduce its model complexity, which can bring significant computational benefits in many applications. The basic idea is to group the original mixture components into compact clusters, and then minimize an upper bound on the approximation error between the original and simplified models.


Doubly Stochastic Normalization for Spectral Clustering

Neural Information Processing Systems

In this paper we focus on the issue of normalization of the affinity matrix in spectral clustering.We show that the difference between N-cuts and Ratio-cuts is in the error measure being used (relative-entropy versus L


Nonnegative Sparse PCA

Neural Information Processing Systems

We describe a nonnegative variant of the "Sparse PCA" problem. The goal is to create a low dimensional representation from a collection of points which on the one hand maximizes the variance of the projected points and on the other uses only parts of the original coordinates, and thereby creating a sparse representation. Whatdistinguishes our problem from other Sparse PCA formulations is that the projection involves only nonnegative weights of the original coordinates -- a desired quality in various fields, including economics, bioinformatics and computer vision.Adding nonnegativity contributes to sparseness, where it enforces a partitioning of the original coordinates among the new axes. We describe a simple yetefficient iterative coordinate-descent type of scheme which converges to a local optimum of our optimization criteria, giving good results on large real world datasets.


The Robustness-Performance Tradeoff in Markov Decision Processes

Neural Information Processing Systems

Computation of a satisfactory control policy for a Markov decision process when the parameters of the model are not exactly known is a problem encountered in many practical applications. The traditional robust approach is based on a worstcase analysisand may lead to an overly conservative policy. In this paper we consider thetradeoff between nominal performance and the worst case performance over all possible models. Based on parametric linear programming, we propose a method that computes the whole set of Pareto efficient policies in the performancerobustness planewhen only the reward parameters are subject to uncertainty. In the more general case when the transition probabilities are also subject to error, we show that the strategy with the "optimal" tradeoff might be non-Markovian and hence is in general not tractable.


A Local Learning Approach for Clustering

Neural Information Processing Systems

We present a local learning approach for clustering. The basic idea is that a good clustering result should have the property that the cluster label of each data point can be well predicted based on its neighboring data and their cluster labels, using currentsupervised learning methods. An optimization problem is formulated such that its solution has the above property. Relaxation and eigen-decomposition are applied to solve this optimization problem. We also briefly investigate the parameter selectionissue and provide a simple parameter selection method for the proposed algorithm. Experimental results are provided to validate the effectiveness ofthe proposed approach.



Particle Filtering for Nonparametric Bayesian Matrix Factorization

Neural Information Processing Systems

Many unsupervised learning problems can be expressed as a form of matrix factorization, reconstructingan observed data matrix as the product of two matrices of latent variables. A standard challenge in solving these problems is determining the dimensionality of the latent matrices.




Randomized PCA Algorithms with Regret Bounds that are Logarithmic in the Dimension

Neural Information Processing Systems

In each trial the current instance is projected onto a probabilistically chosen low dimensional subspace.The total expected quadratic approximation error equals the total quadratic approximation error of the best subspace chosen in hindsight plus some additional term that grows linearly in dimension of the subspace but logarithmically inthe dimension of the instances.