Understanding Dimension Reduction with Principal Component Analysis (PCA)


Big Data Analytics is a buzzword nowadays. Everyone is talking about it. Big data Analytics has found application in many sectors like medicine, politics, dating. Though big data analytics is used in bettering many aspects of human life, it comes with its own problems. One of them is'Curse of dimensionality'.

From Principal Subspaces to Principal Components with Linear Autoencoders

arXiv.org Machine Learning

The autoencoder is an effective unsupervised learning model which is widely used in deep learning. It is well known that an autoencoder with a single fully-connected hidden layer, a linear activation function and a squared error cost function trains weights that span the same subspace as the one spanned by the principal component loading vectors, but that they are not identical to the loading vectors. In this paper, we show how to recover the loading vectors from the autoencoder weights.

Principal components


Principal components analysis (PCA) is a statistical technique that allows to identify underlying linear patterns in a data set so it can be expressed in terms of other data set of significatively lower dimension without much loss of information. The final data set should be able to explain most of the variance of the original data set by making a variable reduction. The final variables will be named as principal components. The following image depicts the activity diagram that shows each step of the principal components analysis that will be explained in detail later. In order to illustrate the process described in the previous diagram, we are going to make use of the following data set which has two dimensions.

Introduction to Principal Component Analysis


The sheer size of data in the modern age is not only a challenge for computer hardware but also the main bottleneck for the performance of many machine learning algorithms. The main goal of a PCA analysis is to identify patterns in data. PCA aims to detect the correlation between variables. If a strong correlation between variables exists, the attempt to reduce the dimensionality only makes sense. It is a statistical method used to reduce the number of variables in a data-set.