Statistical Learning
Approximate Correspondences in High Dimensions
Grauman, Kristen, Darrell, Trevor
Pyramid intersection is an efficient method for computing an approximate partial matching between two sets of feature vectors. We introduce a novel pyramid embedding basedon a hierarchy of non-uniformly shaped bins that takes advantage of the underlying structure of the feature space and remains accurate even for sets with high-dimensional feature vectors. The matching similarity is computed in linear time and forms a Mercer kernel. Whereas previous matching approximation algorithmssuffer from distortion factors that increase linearly with the feature dimension, we demonstrate that our approach can maintain constant accuracy even as the feature dimension increases. When used as a kernel in a discriminative classifier,our approach achieves improved object recognition results over a state-of-the-art set kernel.
Large Margin Multi-channel Analog-to-Digital Conversion with Applications to Neural Prosthesis
Gore, Amit, Chakrabartty, Shantanu
A key challenge in designing analog-to-digital converters for cortically implanted prosthesis is to sense and process high-dimensional neural signals recorded by the micro-electrode arrays. In this paper, we describe a novel architecture for analog-to-digital (A/D) conversion that combines ฮฃ conversion with spatial de-correlation within a single module. The architecture called multiple-input multiple-output (MIMO) ฮฃ is based on a min-max gradient descent optimization ofa regularized linear cost function that naturally lends to an A/D formulation. Usingan online formulation, the architecture can adapt to slow variations in cross-channel correlations, observed due to relative motion of the microelectrodes withrespect to the signal sources. Experimental results with real recorded multi-channel neural data demonstrate the effectiveness of the proposed algorithm in alleviating cross-channel redundancy across electrodes and performing data-compressiondirectly at the A/D converter.
PG-means: learning the number of clusters in data
We present a novel algorithm called PG-means which is able to learn the number of clusters in a classical Gaussian mixture model. Our method is robust and efficient; ituses statistical hypothesis tests on one-dimensional projections of the data and model to determine if the examples are well represented by the model. In so doing, we are applying a statistical test for the entire model at once, not just on a per-cluster basis. We show that our method works well in difficult cases such as non-Gaussian data, overlapping clusters, eccentric clusters, high dimension, and many true clusters. Further, our new method provides a much more stable estimate of the number of clusters than existing methods.
Kernels on Structured Objects Through Nested Histograms
Cuturi, Marco, Fukumizu, Kenji
We propose a family of kernels for structured objects which is based on the bag-ofcomponents paradigm.However, rather than decomposing each complex object into the single histogram of its components, we use for each object a family of nested histograms, where each histogram in this hierarchy describes the object seen from an increasingly granular perspective. We use this hierarchy of histograms todefine elementary kernels which can detect coarse and fine similarities between the objects. We compute through an efficient averaging trick a mixture of such specific kernels, to propose a final kernel value which weights efficiently local and global matches. We propose experimental results on an image retrieval experiment which show that this mixture is an effective template procedure to be used with kernels on histograms.
On Transductive Regression
Cortes, Corinna, Mohri, Mehryar
In many modern large-scale learning applications, the amount of unlabeled data far exceeds that of labeled data. A common instance of this problem is the transductive settingwhere the unlabeled test points are known to the learning algorithm. This paper presents a study of regression problems in that setting. It presents explicit VC-dimension error bounds for transductive regression that hold for all bounded loss functions and coincide with the tight classification bounds of Vapnik when applied to classification. It also presents a new transductive regression algorithminspired by our bound that admits a primal and kernelized closedform solutionand deals efficiently with large amounts of unlabeled data. The algorithm exploits the position of unlabeled points to locally estimate their labels and then uses a global optimization to ensure robust predictions. Our study also includes the results of experiments with several publicly available regression data sets with up to 20,000 unlabeled examples. The comparison with other transductive regressionalgorithms shows that it performs well and that it can scale to large data sets.
Relational Learning with Gaussian Processes
Chu, Wei, Sindhwani, Vikas, Ghahramani, Zoubin, Keerthi, S. S.
Correlation between instances is often modelled via a kernel function using input attributesof the instances. Relational knowledge can further reveal additional pairwise correlations between variables of interest. In this paper, we develop a class of models which incorporates both reciprocal relational information and input attributesusing Gaussian process techniques. This approach provides a novel nonparametric Bayesian framework with a data-dependent covariance function for supervised learning tasks. We also apply this framework to semi-supervised learning. Experimental results on several real world data sets verify the usefulness of this algorithm.
Map-Reduce for Machine Learning on Multicore
Chu, Cheng-tao, Kim, Sang K., Lin, Yi-an, Yu, Yuanyuan, Bradski, Gary, Olukotun, Kunle, Ng, Andrew Y.
We are at the beginning of the multicore era. Computers will have increasingly many cores (processors), but there is still no good programming framework for these architectures, and thus no simple and unified way for machine learning to take advantage of the potential speed up. In this paper, we develop a broadly applicable parallelprogramming method, one that is easily applied to many different learning algorithms. Our work is in distinct contrast to the tradition in machine learning of designing (often ingenious) ways to speed up a single algorithm at a time. Specifically, we show that algorithms that fit the Statistical Query model [15] can be written in a certain "summation form," which allows them to be easily parallelized onmulticore computers. We adapt Google's map-reduce [7] paradigm to demonstrate this parallel speed up technique on a variety of learning algorithms including locally weighted linear regression (LWLR), k-means, logistic regression (LR),naive Bayes (NB), SVM, ICA, PCA, gaussian discriminant analysis (GDA), EM, and backpropagation (NN). Our experimental results show basically linear speedup with an increasing number of processors.