Goto

Collaborating Authors

 Principal Component Analysis


Fair principal component analysis (PCA): minorization-maximization algorithms for Fair PCA, Fair Robust PCA and Fair Sparse PCA

arXiv.org Machine Learning

In this paper we propose a new iterative algorithm to solve the fair PCA (FPCA) problem. We start with the max-min fair PCA formulation originally proposed in [1] and derive a simple and efficient iterative algorithm which is based on the minorization-maximization (MM) approach. The proposed algorithm relies on the relaxation of a semi-orthogonality constraint which is proved to be tight at every iteration of the algorithm. The vanilla version of the proposed algorithm requires solving a semi-definite program (SDP) at every iteration, which can be further simplified to a quadratic program by formulating the dual of the surrogate maximization problem. We also propose two important reformulations of the fair PCA problem: a) fair robust PCA -- which can handle outliers in the data, and b) fair sparse PCA -- which can enforce sparsity on the estimated fair principal components. The proposed algorithms are computationally efficient and monotonically increase their respective design objectives at every iteration. An added feature of the proposed algorithms is that they do not require the selection of any hyperparameter (except for the fair sparse PCA case where a penalty parameter that controls the sparsity has to be chosen by the user). We numerically compare the performance of the proposed methods with two of the state-of-the-art approaches on synthetic data sets and a real-life data set.


Theoretical Guarantees for Sparse Principal Component Analysis based on the Elastic Net

arXiv.org Machine Learning

Sparse principal component analysis (SPCA) is widely used for dimensionality reduction and feature extraction in high-dimensional data analysis. Despite many methodological and theoretical developments in the past two decades, the theoretical guarantees of the popular SPCA algorithm proposed by Zou, Hastie & Tibshirani (2006) are still unknown. This paper aims to address this critical gap. We first revisit the SPCA algorithm of Zou et al. (2006) and present our implementation. We also study a computationally more efficient variant of the SPCA algorithm in Zou et al. (2006) that can be considered as the limiting case of SPCA. We provide the guarantees of convergence to a stationary point for both algorithms and prove that, under a sparse spiked covariance model, both algorithms can recover the principal subspace consistently under mild regularity conditions. We show that their estimation error bounds match the best available bounds of existing works or the minimax rates up to some logarithmic factors. Moreover, we demonstrate the competitive numerical performance of both algorithms in numerical studies.


Node Splitting: A Constructive Algorithm for Feed-Forward Neural Networks

Neural Information Processing Systems

A constructive algorithm is proposed for feed-forward neural networks, which uses node-splitting in the hidden layers to build large networks from smaller ones. The small network forms an approximate model of a set of training data, and the split creates a larger more powerful network which is initialised with the approximate solution already found. The insufficiency of the smaller network in modelling the system which generated the data leads to oscillation in those hidden nodes whose weight vectors cover re(cid:173) gions in the input space where more detail is required in the model. These nodes are identified and split in two using principal component analysis, allowing the new nodes t.o cover the two main modes of each oscillating vector. Nodes are selected for splitting using principal component analysis on the oscillating weight vectors, or by examining the Hessian matrix of second derivatives of the network error with respect to the weight.s.


Self-Organizing Rules for Robust Principal Component Analysis

Neural Information Processing Systems

In the presence of outliers, the existing self-organizing rules for Principal Component Analysis (PCA) perform poorly. Using sta(cid:173) tistical physics techniques including the Gibbs distribution, binary decision fields and effective energies, we propose self-organizing PCA rules which are capable of resisting outliers while fulfilling various PCA-related tasks such as obtaining the first principal com(cid:173) ponent vector, the first k principal component vectors, and directly finding the subspace spanned by the first k vector principal com(cid:173) ponent vectors without solving for each vector individually. Com(cid:173) parative experiments have shown that the proposed robust rules improve the performances of the existing PCA algorithms signifi(cid:173) cantly when outliers are present.


Unsupervised Parallel Feature Extraction from First Principles

Neural Information Processing Systems

We describe a number of learning rules that can be used to train un(cid:173) supervised parallel feature extraction systems. The learning rules are derived using gradient ascent of a quality function. We con(cid:173) sider a number of quality functions that are rational functions of higher order moments of the extracted feature values. We show that one system learns the principle components of the correla(cid:173) tion matrix. Principal component analysis systems are usually not optimal feature extractors for classification.


Non-Linear Statistical Analysis and Self-Organizing Hebbian Networks

Neural Information Processing Systems

Neurons learning under an unsupervised Hebbian learning rule can perform a nonlinear generalization of principal component analysis. This relationship between nonlinear PCA and nonlinear neurons is reviewed. The stable fixed points of the neuron learning dynamics correspond to the maxima of the statist,ic optimized under non(cid:173) linear PCA. However, in order to predict. This is shown for a simple model. Methods of statistical mechanics can be used to find the optima of the objective function of non-linear PCA.


Analysis of Unstandardized Contributions in Cross Connected Networks

Neural Information Processing Systems

Understanding knowledge representations in neural nets has been a difficult problem. Principal components analysis (PCA) of contributions (products of sending activations and connection weights) has yielded valuable insights into knowledge representations, but much of this work has focused on the correlation matrix of contributions. The present work shows that analyzing the variance-covariance matrix of contributions yields more valid insights by taking account of weights.


Recognizing Handwritten Digits Using Mixtures of Linear Models

Neural Information Processing Systems

We construct a mixture of locally linear generative models of a col(cid:173) lection of pixel-based images of digits, and use them for recogni(cid:173) tion. Different models of a given digit are used to capture different styles of writing, and new images are classified by evaluating their log-likelihoods under each model. We use an EM-based algorithm in which the M-step is computationally straightforward principal components analysis (PCA). Incorporating tangent-plane informa(cid:173) tion [12] about expected local deformations only requires adding tangent vectors into the sample covariance matrices for the PCA, and it demonstrably improves performance.


Dynamic Features for Visual Speechreading: A Systematic Comparison

Neural Information Processing Systems

Humans use visual as well as auditory speech signals to recognize spoken words. A variety of systems have been investigated for per(cid:173) forming this task. The main purpose of this research was to sys(cid:173) tematically compare the performance of a range of dynamic visual features on a speechreading task. We have found that normal(cid:173) ization of images to eliminate variation due to translation, scale, and planar rotation yielded substantial improvements in general(cid:173) ization performance regardless of the visual representation used. In addition, the dynamic information in the difference between suc(cid:173) cessive frames yielded better performance than optical-flow based approaches, and compression by local low-pass filtering worked sur(cid:173) prisingly better than global principal components analysis (PCA).


Viewpoint Invariant Face Recognition using Independent Component Analysis and Attractor Networks

Neural Information Processing Systems

We have explored two approaches to recogmzmg faces across changes in pose. First, we developed a representation of face images based on independent component analysis (ICA) and compared it to a principal component analysis (PCA) representation for face recognition. The ICA basis vectors for this data set were more spatially local than the PCA basis vectors and the ICA representa(cid:173) tion had greater invariance to changes in pose. Second, we present a model for the development of viewpoint invariant responses to faces from visual experience in a biological system. The temporal continuity of natural visual experience was incorporated into an attractor network model by Hebbian learning following a lowpass temporal filter on unit activities.