In Natural Language Processing (NLP) tasks, data often has the following two properties: First, data can be chopped into multi-views which has been successfully used for dimension reduction purposes. For example, in topic classification, every paper can be chopped into the title, the main text and the references. However, it is common that some of the views are less noisier than other views for supervised learning problems. Second, unlabeled data are easy to obtain while labeled data are relatively rare. For example, articles occurred on New York Times in recent 10 years are easy to grab but having them classified as 'Politics', 'Finance' or 'Sports' need human labor. Hence less noisy features are preferred before running supervised learning methods. In this paper we propose an unsupervised algorithm which optimally weights features from different views when these views are generated from a low dimensional hidden state, which occurs in widely used models like Mixture Gaussian Model, Hidden Markov Model (HMM) and Latent Dirichlet Allocation (LDA).

DeMers, David, Cottrell, Garrison W.

A method for creating a nonlinear encoder-decoder for multidimensional data with compact representations is presented. The commonly used technique of autoassociation is extended to allow nonlinear representations, and an objective functionwhich penalizes activations of individual hidden units is shown to result in minimum dimensional encodings with respect to allowable error in reconstruction. 1 INTRODUCTION Reducing dimensionality of data with minimal information loss is important for feature extraction, compact coding and computational efficiency. The data can be tranformed into "good" representations for further processing, constraints among feature variables may be identified, and redundancy eliminated. Many algorithms are exponential in the dimensionality of the input, thus even reduction by a single dimension may provide valuable computational savings. Autoassociating feedforward networks with one hidden layer have been shown to extract the principal components of the data (Baldi & Hornik, 1988). Such networks have been used to extract features and develop compact encodings of the data (Cottrell, Munro & Zipser, 1989). Principal Components Analysis projects the data into a linear subspace -email: demers@cs.ucsd.edu

Wang, Meihong, Sha, Fei, Jordan, Michael I.

We apply the framework of kernel dimension reduction, originally designed for supervised problems, to unsupervised dimensionality reduction. In this framework, kernel-based measures of independence are used to derive low-dimensional representations that maximally capture information in covariates in order to predict responses. We extend this idea and develop similarly motivated measures for unsupervised problems where covariates and responses are the same. Our empirical studies show that the resulting compact representation yields meaningful and appealing visualization and clustering of data. Furthermore, when used in conjunction with supervised learners for classification, our methods lead to lower classification errors than state-of-the-art methods, especially when embedding data in spaces of very few dimensions.

Kambhatla, Nanda, Leen, Todd K.

We propose a new distance measure which is optimal for the task of local PCA. Our results with speech and image data indicate that the nonlinear techniques provide more accurate encodings than PCA. Our local linear algorithm produces more accurate encodings (except for one simulation with image data), and trains much faster than five layer auto-associative networks. Acknowledgments This work was supported by grants from the Air Force Office of Scientific Research (F49620-93-1-0253) and Electric Power Research Institute (RP8015-2). The authors are grateful to Gary Cottrell and David DeMers for providing their image database and clarifying their experimental results. We also thank our colleagues in the Center for Spoken Language Understanding at OGI for providing speech data.

We give a simple and effective two stage algorithm for approximating a point cloud $\mathcal{S}\subset\mathbb{R}^m$ by a simplicial complex $K$. The first stage is an iterative fitting procedure that generalizes k-means clustering, while the second stage involves deleting redundant simplices. A form of dimension reduction of $\mathcal{S}$ is obtained as a consequence.