Parsimonious Feature Extraction Methods: Extending Robust Probabilistic Projections with Generalized Skew-t

Toczydlowska, Dorota, Peters, Gareth W., Shevchenko, Pavel V.

arXiv.org Machine Learning 

The study focuses on extension to the approach of Principal Component Analysis (PCA), as defined in [1], [2] or [3]. PCA and related matrix factorisation methodologies are widely used in data-rich environments for dimensionality reduction, data compression, feature-extraction techniques or data de-noising. The methodologies identify a lower-dimensional linear subspace to represent the data, which captures second-order dominant information contained in high-dimensional data sets. PCA can be viewed as a matrix factorisation problem which aims to learn the lower-dimensional representation of the data, preserving its Euclidean structure. However, in the presence of either a non-Gaussian distribution of the data generating distribution or in the presence of outliers which corrupt the data, the standard PCA methodology provides biased information about the lower-rank representation. In many applications, the stochastic noise or observation errors in the data set are assumed to be, in some sense, "well-behaved"; for instance, additive, light-tailed, symmetric and zero-mean. When non-robust feature extraction methods are naively utilised in the presence of violations of these implicit statistical assumptions, the information contained in the extracted features cannot be relied upon, resulting in misleading inference. Therefore, it is critical to ensure that the feature extraction captures information about correct characteristics of the process generating the data. In the following study, we relax the inherent assumption of "well-behaved" observation noise by developing a class of robust estimators that can withstand violations of such assumptions, which routinely arise in real data sets.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found