kcca
aab3238922bcc25a6f606eb525ffdc56-Reviews.html
The paper introduces a new method called RDC to measure the statistical dependence between random variables. It combines a copula transform to a variant of kernel CCA using random projections, resulting in a O(n log n) complexity. Results on synthetic and real benchmark data show promising results for feature selection. The paper is overall clear and pleasant to read. The good experimental results and simplicity of implementation suggest that the proposed method may be useful in complement to other existing methods.
Two view learning: SVM-2K, Theory and Practice
Kernel methods make it relatively easy to define complex highdimensional feature spaces. This raises the question of how we can identify the relevant subspaces for a particular learning task. When two views of the same phenomenon are available kernel Canonical Correlation Analysis (KCCA) has been shown to be an effective preprocessing step that can improve the performance of classification algorithms such as the Support Vector Machine (SVM). This paper takes this observation to its logical conclusion and proposes a method that combines this two stage learning (KCCA followed by SVM) into a single optimisation termed SVM-2K. We present both experimental and theoretical analysis of the approach showing encouraging results and insights.
Deep Canonically Correlated LSTMs
Mallinar, Neil, Rosset, Corbin
Abstract-- We examine Deep Canonically Correlated LSTMs as a way to learn nonlinear transformations of variable length sequences and embed them into a correlated, fixed dimensional space. We use LSTMs to transform multi-view time-series data non-linearly while learning temporal relationships within the data. We then perform correlation analysis on the outputs of these neural networks to find a correlated subspace through which we get our final representation via projection. This work follows from previous work done on Deep Canonical Correlation (DCCA), in which deep feed-forward neural networks were used to learn nonlinear transformations of data while maximizing correlation. I. INTRODUCTION It is common in modern data sets to have multiple views of data collected of a phenomenon, for instance, a set of images and their captions in text, or audio and video data of the same event. If there exist labels, the views are conditionally uncorrelated on them, and it is typically assumed that noise sources between views are uncorrelated so that the representations are discriminating of the underlying semantic content. To distinguish it from multi-modal learning, multi-view learning trains a model or classifier for each view, the application of which depends on what data is available at test time. Typically it is desirable to find representations for each view that are predictive of - and predicted by - the other views so that if one view is not available at test time, it can serve to denoise the other views, or serve as a soft supervisor providing pseudo-labels. The benefits of training on multiple views include reduced sample complexity for prediction scenarios [1], relaxed separation conditions for clustering [2], among others. CCA techniques are used successfully across a wide array of downstream tasks (often unsupervised) from fMRI analysis [3], to retrieval, categorization, and clustering of text documents [4], [5], to acoustic feature learning [6]-[8].
Nonparametric Canonical Correlation Analysis
Michaeli, Tomer, Wang, Weiran, Livescu, Karen
Canonical correlation analysis (CCA) is a classical representation learning technique for finding correlated variables in multi-view data. Several nonlinear extensions of the original linear CCA have been proposed, including kernel and deep neural network methods. These approaches seek maximally correlated projections among families of functions, which the user specifies (by choosing a kernel or neural network structure), and are computationally demanding. Interestingly, the theory of nonlinear CCA, without functional restrictions, had been studied in the population setting by Lancaster already in the 1950s, but these results have not inspired practical algorithms. We revisit Lancaster's theory to devise a practical algorithm for nonparametric CCA (NCCA). Specifically, we show that the solution can be expressed in terms of the singular value decomposition of a certain operator associated with the joint density of the views. Thus, by estimating the population density from data, NCCA reduces to solving an eigenvalue system, superficially like kernel CCA but, importantly, without requiring the inversion of any kernel matrix. We also derive a partially linear CCA (PLCCA) variant in which one of the views undergoes a linear projection while the other is nonparametric. Using a kernel density estimate based on a small number of nearest neighbors, our NCCA and PLCCA algorithms are memory-efficient, often run much faster, and perform better than kernel CCA and comparable to deep CCA.
Multi-Label Classification with Feature-Aware Non-Linear Label Space Transformation
Li, Xin (Temple University) | Guo, Yuhong (Temple University)
Multi-label classification with many classes has recently drawn a lot of attention. Existing methods address this problem by performing linear label space transformation to reduce the dimension of label space, and then conducting independent regression for each reduced label dimension. These methods however do not capture nonlinear correlations of the multiple labels and may lead to significant information loss in the process of label space reduction. In this paper, we first propose to exploit kernel canonical correlation analysis (KCCA) to capture nonlinear label correlation information and perform nonlinear label space reduction. Then we develop a novel label space reduction method that explicitly combines linear and nonlinear label space transformations based on CCA and KCCA respectively to address multi-label classification with many classes. The proposed method is a feature-aware label transformation method that promotes the label predictability in the transformed label space from the input features. We conduct experiments on a number of multi-label classification datasets. The proposed approach demonstrates good performance, comparing to a number of state-of-the-art label dimension reduction methods.
Kernel Multivariate Analysis Framework for Supervised Subspace Learning: A Tutorial on Linear and Kernel Multivariate Methods
Arenas-Garcรญa, Jerรณnimo, Petersen, Kaare Brandt, Camps-Valls, Gustavo, Hansen, Lars Kai
Feature extraction and dimensionality reduction are important tasks in many fields of science dealing with signal processing and analysis. The relevance of these techniques is increasing as current sensory devices are developed with ever higher resolution, and problems involving multimodal data sources become more common. A plethora of feature extraction methods are available in the literature collectively grouped under the field of Multivariate Analysis (MVA). This paper provides a uniform treatment of several methods: Principal Component Analysis (PCA), Partial Least Squares (PLS), Canonical Correlation Analysis (CCA) and Orthonormalized PLS (OPLS), as well as their non-linear extensions derived by means of the theory of reproducing kernel Hilbert spaces. We also review their connections to other methods for classification and statistical dependence estimation, and introduce some recent developments to deal with the extreme cases of large-scale and low-sized problems. To illustrate the wide applicability of these methods in both classification and regression problems, we analyze their performance in a benchmark of publicly available data sets, and pay special attention to specific real applications involving audio processing for music genre prediction and hyperspectral satellite images for Earth and climate monitoring.
Sparse Additive Functional and Kernel CCA
Balakrishnan, Sivaraman, Puniyani, Kriti, Lafferty, John
Canonical Correlation Analysis (CCA) is a classical tool for finding correlations among the components of two random vectors. In recent years, CCA has been widely applied to the analysis of genomic data, where it is common for researchers to perform multiple assays on a single set of patient samples. Recent work has proposed sparse variants of CCA to address the high dimensionality of such data. However, classical and sparse CCA are based on linear models, and are thus limited in their ability to find general correlations. In this paper, we present two approaches to high-dimensional nonparametric CCA, building on recent developments in high-dimensional nonparametric regression. We present estimation procedures for both approaches, and analyze their theoretical properties in the high-dimensional setting. We demonstrate the effectiveness of these procedures in discovering nonlinear correlations via extensive simulations, as well as through experiments with genomic data.
Sparse Canonical Correlation Analysis
Hardoon, David R., Shawe-Taylor, John
We present a novel method for solving Canonical Correlation Analysis (CCA) in a sparse convex framework using a least squares approach. The presented method focuses on the scenario when one is interested in (or limited to) a primal representation for the first view while having a dual representation for the second view. Sparse CCA (SCCA) minimises the number of features used in both the primal and dual projections while maximising the correlation between the two views. The method is demonstrated on two paired corpuses of English-French and English-Spanish for mate-retrieval. We are able to observe, in the mate-retreival, that when the number of the original features is large SCCA outperforms Kernel CCA (KCCA), learning the common semantic space from a sparse set of features.