Goto

Collaborating Authors

 Country


Model Based Population Tracking and Automatic Detection of Distribution Changes

Neural Information Processing Systems

Probabilistic mixture models are used for a broad range of data analysis tasks such as clustering, classification, predictive modeling, etc. Due to their inherent probabilistic nature, mixture models can easily be combined with other probabilistic or non-probabilistic techniques thus forming more complex data analysis systems. In the case of online data (where there is a stream of data available) models can be constantly updated to reflect the most current distribution of the incoming data. However, in many business applications the models themselves represent a parsimonious summary of the data and therefore it is not desirable to change models frequently, much less with every new data point. In such a framework it becomes crucial to track the applicability of the mixture model and detect the point in time when the model fails to adequately represent the data. In this paper we formulate the problem of change detection and propose a principled solution. Empirical results over both synthetic and real-life data sets are presented.



Stochastic Mixed-Signal VLSI Architecture for High-Dimensional Kernel Machines

Neural Information Processing Systems

A mixed-signal paradigm is presented for high-resolution parallel innerproduct computationin very high dimensions, suitable for efficient implementation ofkernels in image processing. At the core of the externally digital architecture is a high-density, low-power analog array performing binary-binary partial matrix-vector multiplication. Full digital resolution is maintained even with low-resolution analog-to-digital conversion, owing torandom statistics in the analog summation of binary products. A random modulation scheme produces near-Bernoulli statistics even for highly correlated inputs. The approach is validated with real image data, and with experimental results from a CID/DRAM analog array prototype in 0.5


Spectral Relaxation for K-means Clustering

Neural Information Processing Systems

In K-means clusters are represented by centers of mass of their members, and it can be shown that the K-means algorithm of alternating between assigning cluster membership for each data vector to the nearest cluster center and computing the center of each cluster as the centroid of its member data vectors is equivalent to finding the minimum of a sum-of-squares cost function using coordinate descend. Despite the popularity of K means clustering, one of its major drawbacks is that the coordinate descend search method is prone to local minima. Much research has been done on computing refined initial points and adding explicit constraints to the sum-of-squares cost function for K-means clustering so that the search can converge to better local minimum [1,2]. In this paper we tackle the problem from a different angle: we find an equivalent formulation of the sum-of-squares minimization as a trace maximization problem with special constraints; relaxing the constraints leads to a maximization problem that possesses optimal global solutions. As a byproduct we also have an easily computable lower bound for the minimum of the sum-of-squares cost function. Our work is inspired by [9, 3] where connection to Gram matrix and extension of K means method to general Mercer kernels were investigated. The rest of the paper is organized as follows: in section 2, we derive the equivalent trace maximization formulation and discuss its spectral relaxation. In section 3, we discuss how to assign cluster membership using pivoted QR decomposition, taking into account the special structure of the partial eigenvector matrix. Finally, in section 4, we illustrate the performance of the clustering algorithms using document clustering as an example.


Learning Lateral Interactions for Feature Binding and Sensory Segmentation

Neural Information Processing Systems

We present a new approach to the supervised learning of lateral interactions forthe competitive layer model (CLM) dynamic feature binding architecture. The method is based on consistency conditions, which were recently shown to characterize the attractor states of this linear threshold recurrent network. For a given set of training examples the learning problem isformulated as a convex quadratic optimization problem in the lateral interaction weights. An efficient dimension reduction of the learning problem can be achieved by using a linear superposition of basis interactions.


Adaptive Sparseness Using Jeffreys Prior

Neural Information Processing Systems

In this paper we introduce a new sparseness inducing prior which does not involve any (hyper)parameters thatneed to be adjusted or estimated. Although other applications are possible, we focus here on supervised learning problems: regression and classification. Experiments withseveral publicly available benchmark data sets show that the proposed approach yields state-of-the-art performance. In particular, our method outperforms support vector machines and performs competitively with the best alternative techniques, both in terms of error rates and sparseness, although it involves no tuning or adjusting of sparsenesscontrolling hyper-parameters.


A kernel method for multi-labelled classification

Neural Information Processing Systems

This article presents a Support Vector Machine (SVM) like learning system tohandle multi-label problems. Such problems are usually decomposed intomany two-class problems but the expressive power of such a system can be weak [5, 7]. We explore a new direct approach. It is based on a large margin ranking system that shares a lot of common properties withSVMs. We tested it on a Yeast gene functional classification problem with positive results.


Learning from Infinite Data in Finite Time

Neural Information Processing Systems

We propose the following general method for scaling learning algorithms to arbitrarily large data sets. We apply this method to the EM algorithm for mixtures of Gaussians. Preliminary experiments on a series of large data sets provide evidence of the potential of this approach. On the other hand, they require large computational resources to learn from. While in the past the factor limiting the quality of learnable models was typically the quantity of data available, in many domains today data is superabundant, and the bottleneck is t he time required to process it.


Adaptive Nearest Neighbor Classification Using Support Vector Machines

Neural Information Processing Systems

The nearest neighbor technique is a simple and appealing method to address classification problems. It relies on the assumption of locally constant class conditional probabilities. This assumption becomes invalid in high dimensions with a finite number of examples dueto the curse of dimensionality. We propose a technique that computes a locally flexible metric by means of Support Vector Machines (SVMs). The maximum margin boundary found by the SVM is used to determine the most discriminant direction over the query's neighborhood. Such direction provides a local weighting scheme for input features.


TAP Gibbs Free Energy, Belief Propagation and Sparsity

Neural Information Processing Systems

The adaptive TAP Gibbs free energy for a general densely connected probabilistic model with quadratic interactions and arbritary single site constraints is derived. We show how a specific sequential minimization of the free energy leads to a generalization of Minka's expectation propagation. Lastly,we derive a sparse representation version of the sequential algorithm. The usefulness of the approach is demonstrated on classification anddensity estimation with Gaussian processes and on an independent componentanalysis problem.