Metric Learning by Collapsing Classes

Neural Information Processing Systems

We present an algorithm for learning a quadratic Gaussian metric (Mahalanobis distance)for use in classification tasks. Our method relies on the simple geometric intuition that a good metric is one under which points in the same class are simultaneously near each other and far from points in the other classes. We construct a convex optimization problem whose solution generates such a metric by trying to collapse all examples in the same class to a single point and push examples in other classes infinitely far away. We show that when the metric we learn is used in simple classifiers, ityields substantial improvements over standard alternatives on a variety of problems. We also discuss how the learned metric may be used to obtain a compact low dimensional feature representation of the original input space, allowing more efficient classification with very little reduction in performance.

Likelihood Estimation with Incomplete Array Variate Observations Machine Learning

Missing data is an important challenge when dealing with high dimensional data arranged in the form of an array. In this paper, we propose methods for estimation of the parameters of array variate normal probability model from partially observed multiway data. The methods developed here are useful for missing data imputation, estimation of mean and covariance parameters for multiway data. A multiway semi-parametric mixed effects model that allows separation of multiway covariance effects is also defined and an efficient algorithm for estimation is recommended. We provide simulation results along with real life data from genetics to demonstrate these methods.

A Fast Algorithm for Clustering High Dimensional Feature Vectors Machine Learning

We propose an algorithm for clustering high dimensional data. If $P$ features for $N$ objects are represented in an $N\times P$ matrix ${\bf X}$, where $N\ll P$, the method is based on exploiting the cluster-dependent structure of the $N\times N$ matrix ${\bf XX}^T$. Computational burden thus depends primarily on $N$, the number of objects to be clustered, rather than $P$, the number of features that are measured. This makes the method particularly useful in high dimensional settings, where it is substantially faster than a number of other popular clustering algorithms. Aside from an upper bound on the number of potential clusters, the method is independent of tuning parameters. When compared to $16$ other clustering algorithms on $32$ genomic datasets with gold standards, we show that it provides the most accurate cluster configuration more than twice as often than its closest competitors. We illustrate the method on data taken from highly cited genomic studies.

Multi-View Kernel Consensus For Data Analysis and Signal Processing Machine Learning

The input data features set for many data driven tasks is high-dimensional while the intrinsic dimension of the data is low. Data analysis methods aim to uncover the underlying low dimensional structure imposed by the low dimensional hidden parameters by utilizing distance metrics that consider the set of attributes as a single monolithic set. However, the transformation of the low dimensional phenomena into the measured high dimensional observations might distort the distance metric, This distortion can effect the desired estimated low dimensional geometric structure. In this paper, we suggest to utilize the redundancy in the attribute domain by partitioning the attributes into multiple subsets we call views. The proposed methods utilize the agreement also called consensus between different views to extract valuable geometric information that unifies multiple views about the intrinsic relationships among several different observations. This unification enhances the information that a single view or a simple concatenations of views provides.

On the Spectrum of Random Features Maps of High Dimensional Data Machine Learning

Random feature maps are ubiquitous in modern statistical machine learning, where they generalize random projections by means of powerful, yet often difficult to analyze nonlinear operators. In this paper, we leverage the "concentration" phenomenon induced by random matrix theory to perform a spectral analysis on the Gram matrix of these random feature maps, here for Gaussian mixture models of simultaneously large dimension and size. Our results are instrumental to a deeper understanding on the interplay of the nonlinearity and the statistics of the data, thereby allowing for a better tuning of random feature-based techniques.