Statistical Learning
Out-of-Sample Extensions for LLE, Isomap, MDS, Eigenmaps, and Spectral Clustering
Bengio, Yoshua, Paiement, Jean-françcois, Vincent, Pascal, Delalleau, Olivier, Roux, Nicolas L., Ouimet, Marie
Several unsupervised learning algorithms based on an eigendecomposition provideeither an embedding or a clustering only for given training points, with no straightforward extension for out-of-sample examples short of recomputing eigenvectors. This paper provides a unified framework forextending Local Linear Embedding (LLE), Isomap, Laplacian Eigenmaps, Multi-Dimensional Scaling (for dimensionality reduction) as well as for Spectral Clustering. This framework is based on seeing these algorithms as learning eigenfunctions of a data-dependent kernel. Numerical experiments show that the generalizations performed have a level of error comparable to the variability of the embedding algorithms due to the choice of training data.
Locality Preserving Projections
Many problems in information processing involve some form of dimensionality reduction.In this paper, we introduce Locality Preserving Projections (LPP). These are linear projective maps that arise by solving a variational problem that optimally preserves the neighborhood structure of the data set. LPP should be seen as an alternative to Principal Component Analysis(PCA) - a classical linear technique that projects the data along the directions of maximal variance. When the high dimensional datalies on a low dimensional manifold embedded in the ambient space, the Locality Preserving Projections are obtained by finding the optimal linear approximations to the eigenfunctions of the Laplace Beltrami operatoron the manifold.
Fast Feature Selection from Microarray Expression Data via Multiplicative Large Margin Algorithms
New feature selection algorithms for linear threshold functions are described whichcombine backward elimination with an adaptive regularization method. This makes them particularly suitable to the classification of microarray expression data, where the goal is to obtain accurate rules depending on few genes only. Our algorithms are fast and easy to implement, since they center on an incremental (large margin) algorithm which allows us to avoid linear, quadratic or higher-order programming methods. We report on preliminary experiments with five known DNA microarray datasets. These experiments suggest that multiplicative large margin algorithms tend to outperform additive algorithms (such as SVM) on feature selection tasks.
Efficient and Robust Feature Extraction by Maximum Margin Criterion
Li, Haifeng, Jiang, Tao, Zhang, Keshu
A new feature extraction criterion, maximum margin criterion (MMC), is proposed in this paper. This new criterion is general in the sense that, when combined with a suitable constraint, it can actually give rise to the most popular feature extractor in the literature, linear discriminate analysis (LDA).
Clustering with the Connectivity Kernel
Fischer, Bernd, Roth, Volker, Buhmann, Joachim M.
Clustering aims at extracting hidden structure in dataset. While the problem offinding compact clusters has been widely studied in the literature, extractingarbitrarily formed elongated structures is considered a much harder problem. In this paper we present a novel clustering algorithm whichtackles the problem by a two step procedure: first the data are transformed in such a way that elongated structures become compact ones. In a second step, these new objects are clustered by optimizing a compactness-based criterion. The advantages of the method over related approaches are threefold: (i) robustness properties of compactness-based criteria naturally transfer to the problem of extracting elongated structures, leadingto a model which is highly robust against outlier objects; (ii) the transformed distances induce a Mercer kernel which allows us to formulate a polynomial approximation scheme to the generally N P-hard clustering problem; (iii) the new method does not contain free kernel parameters in contrast to methods like spectral clustering or mean-shift clustering.
Convex Methods for Transduction
Bie, Tijl D., Cristianini, Nello
The 2-class transduction problem, as formulated by Vapnik [1], involves finding a separating hyperplane for a labelled data set that is also maximally distant from a given set of unlabelled test points. In this form, the problem has exponential computational complexity in the size of the working set. So far it has been attacked by means of integer programming techniques [2] that do not scale to reasonable problem sizes, or by local search procedures [3]. In this paper we present a relaxation of this task based on semidefinite programming(SDP), resulting in a convex optimization problem that has polynomial complexity in the size of the data set. The results are very encouraging for mid sized data sets, however the cost is still too high for large scale problems, due to the high dimensional searchspace. To this end, we restrict the feasible region by introducing an approximation based on solving an eigenproblem. With this approximation, the computational cost of the algorithm is such that problems with more than 1000 points can be treated.