Goto

Collaborating Authors

 Dimensionality Reduction


Sufficient Dimensionality Reduction with Irrelevant Statistics

arXiv.org Machine Learning

The problem of finding a reduced dimensionality representation of categorical variables while preserving their most relevant characteristics is fundamental for the analysis of complex data. Specifically, given a co-occurrence matrix of two variables, one often seeks a compact representation of one variable which preserves information about the other variable. We have recently introduced ``Sufficient Dimensionality Reduction' [GT-2003], a method that extracts continuous reduced dimensional features whose measurements (i.e., expectation values) capture maximal mutual information among the variables. However, such measurements often capture information that is irrelevant for a given task. Widely known examples are illumination conditions, which are irrelevant as features for face recognition, writing style which is irrelevant as a feature for content classification, and intonation which is irrelevant as a feature for speech recognition. Such irrelevance cannot be deduced apriori, since it depends on the details of the task, and is thus inherently ill defined in the purely unsupervised case. Separating relevant from irrelevant features can be achieved using additional side data that contains such irrelevant structures. This approach was taken in [CT-2002], extending the information bottleneck method, which uses clustering to compress the data. Here we use this side-information framework to identify features whose measurements are maximally informative for the original data set, but carry as little information as possible on a side data set. In statistical terms this can be understood as extracting statistics which are maximally sufficient for the original dataset, while simultaneously maximally ancillary for the side dataset. We formulate this tradeoff as a constrained optimization problem and characterize its solutions. We then derive a gradient descent algorithm for this problem, which is based on the Generalized Iterative Scaling method for finding maximum entropy distributions. The method is demonstrated on synthetic data, as well as on real face recognition datasets, and is shown to outperform standard methods such as oriented PCA.


Dimensionality Reduction and Classification feature using Mutual Information applied to Hyperspectral Images : A Filter strategy based algorithm

arXiv.org Artificial Intelligence

Hyperspectral images (HIS) classification is a high technical remote sensing tool. The goal is to reproduce a thematic map that will be compared with a reference ground truth map (GT), constructed by expecting the region. The HIS contains more than a hundred bidirectional measures, called bands (or simply images), of the same region. They are taken at juxtaposed frequencies. Unfortunately, some bands contain redundant information, others are affected by the noise, and the high dimensionality of features made the accuracy of classification lower. The problematic is how to find the good bands to classify the pixels of regions. Some methods use Mutual Information (MI) and threshold, to select relevant bands, without treatment of redundancy. Others control and eliminate redundancy by selecting the band top ranking the MI, and if its neighbors have sensibly the same MI with the GT, they will be considered redundant and so discarded. This is the most inconvenient of this method, because this avoids the advantage of hyperspectral images: some precious information can be discarded. In this paper we'll accept the useful redundancy. A band contains useful redundancy if it contributes to produce an estimated reference map that has higher MI with the GT.nTo control redundancy, we introduce a complementary threshold added to last value of MI. This process is a Filter strategy; it gets a better performance of classification accuracy and not expensive, but less preferment than Wrapper strategy.


Variable noise and dimensionality reduction for sparse Gaussian processes

arXiv.org Machine Learning

The sparse pseudo-input Gaussian process (SPGP) is a new approximation method for speeding up GP regression in the case of a large number of data points N. The approximation is controlled by the gradient optimization of a small set of M `pseudo-inputs', thereby reducing complexity from N^3 to NM^2. One limitation of the SPGP is that this optimization space becomes impractically big for high dimensional data sets. This paper addresses this limitation by performing automatic dimensionality reduction. A projection of the input space to a low dimensional space is learned in a supervised manner, alongside the pseudo-inputs, which now live in this reduced space. The paper also investigates the suitability of the SPGP for modeling data with input-dependent noise. A further extension of the model is made to make it even more powerful in this regard - we learn an uncertainty parameter for each pseudo-input. The combination of sparsity, reduced dimension, and input-dependent noise makes it possible to apply GPs to much larger and more complex data sets than was previously practical. We demonstrate the benefits of these methods on several synthetic and real world problems.


Regularizers versus Losses for Nonlinear Dimensionality Reduction: A Factored View with New Convex Relaxations

arXiv.org Machine Learning

We demonstrate that almost all non-parametric dimensionality reduction methods can be expressed by a simple procedure: regularized loss minimization plus singular value truncation. By distinguishing the role of the loss and regularizer in such a process, we recover a factored perspective that reveals some gaps in the current literature. Beyond identifying a useful new loss for manifold unfolding, a key contribution is to derive new convex regularizers that combine distance maximization with rank reduction. These regularizers can be applied to any loss.


Dimensionality Reduction by Local Discriminative Gaussians

arXiv.org Machine Learning

We present local discriminative Gaussian (LDG) dimensionality reduction, a supervised dimensionality reduction technique for classification. The LDG objective function is an approximation to the leave-one-out training error of a local quadratic discriminant analysis classifier, and thus acts locally to each training point in order to find a mapping where similar data can be discriminated from dissimilar data. While other state-of-the-art linear dimensionality reduction methods require gradient descent or iterative solution approaches, LDG is solved with a single eigen-decomposition. Thus, it scales better for datasets with a large number of feature dimensions or training examples. We also adapt LDG to the transfer learning setting, and show that it achieves good performance when the test data distribution differs from that of the training data.


Quantitative Comparison of Linear and Non-linear Dimensionality Reduction Techniques for Solar Image Archives

AAAI Conferences

This work investigates the applicability of several dimensionality reduction techniques for large scale solar data analysis. Using the first solar domain-specific benchmark dataset that contains images of multiple types of phenomena, we investigate linear and non-linear dimensionality reduction methods in order to reduce our storage costs and maintain an accurate representation of our data in a new vector space. We present a comparative analysis between several dimensionality reduction methods and different numbers of target dimensions by utilizing different classifiers in order to determine the percentage of dimensionality reduction that can be achieved on solar data with said methods, and to discover the method that is the most effective for solar images.


Dimensionality Reduction Using the Sparse Linear Model

Neural Information Processing Systems

We propose an approach for linear unsupervised dimensionality reduction, based on the sparse linear model that has been used to probabilistically interpret sparse coding. We formulate an optimization problem for learning a linear projection from the original signal domain to a lower-dimensional one in a way that approximately preserves, in expectation, pairwise inner products in the sparse domain. We derive solutions to the problem, present nonlinear extensions, and discuss relations to compressed sensing. Our experiments using facial images, texture patches, and images of object categories suggest that the approach can improve our ability to recover meaningful structure in many classes of signals.


Direct Density-Ratio Estimation with Dimensionality Reduction via Hetero-Distributional Subspace Analysis

AAAI Conferences

Methods for estimating the ratio of two probability density functions have been actively explored recently since they can be used for various data processing tasks such as non-stationarity adaptation, outlier detection, feature selection, and conditional probability estimation. In this paper, we propose a new density-ratio estimator which incorporates dimensionality reduction into the density-ratio estimation procedure. Through experiments, the proposed method is shown to compare favorably with existing density-ratio estimators in terms of both accuracy and computational costs.


A convex model for non-negative matrix factorization and dimensionality reduction on physical space

arXiv.org Machine Learning

A collaborative convex framework for factoring a data matrix $X$ into a non-negative product $AS$, with a sparse coefficient matrix $S$, is proposed. We restrict the columns of the dictionary matrix $A$ to coincide with certain columns of the data matrix $X$, thereby guaranteeing a physically meaningful dictionary and dimensionality reduction. We use $l_{1,\infty}$ regularization to select the dictionary from the data and show this leads to an exact convex relaxation of $l_0$ in the case of distinct noise free data. We also show how to relax the restriction-to-$X$ constraint by initializing an alternating minimization approach with the solution of the convex model, obtaining a dictionary close to but not necessarily in $X$. We focus on applications of the proposed framework to hyperspectral endmember and abundances identification and also show an application to blind source separation of NMR data.


CrossBridge: Finding Analogies Using Dimensionality Reduction

AAAI Conferences

We present CrossBridge, a practical algorithm for retrieving analogies in large, sparse semantic networks. Other algorithms adopt a generate-and-test approach, retrieving candidate analogies by superficial similarity of concepts, then testing them for the particular relations involved in the analogy. CrossBridge adopts a global approach. It organizes the entire knowledge space at once, as a matrix of small concept-and-relation subgraph patterns versus actual occurrences of subgraphs from the knowledge base. It uses the familiar mathematics of dimensionality reduction to reorganize this space along dimensions representing approximate semantic similarity of these subgraphs. Analogies can then be retrieved by simple nearest-neighbor comparison. CrossBridge also takes into account not only knowledge directly related to the source and target domains, but also a large background Commonsense knowledge base. Commonsense influences the mapping between domains, preserving important relations while ignoring others. This property allows CrossBridge to find more intuitive and extensible analogies. We compare our approach with an implementation of structure mapping and show that our algorithm consistently finds analogies in cases where structure mapping fails. We also present some discovered analogies.