In the previous post, we saw why we should be interested in Principal Component Analysis. In this post, we will do some deep dive and get to know how this is implemented. Now that you have some idea about how to change higher dimensions to lower dimensions, we will go through the below description which is shown in a jupyter notebook. I have downloaded the data of three companies that are in the Indian stock market from Quandl. We will try to understand the Indian ecosystem using this.
The purpose of this post is to give the reader detailed understanding of Principal Component Analysis with the necessary mathematical proofs. We plot the data and find various patterns in it or use it to train some machine learning models. One way to think about dimensions is that suppose you have an data point x, if we consider this data point as a physical object then dimensions are merely a basis of view, like where is the data located when it is observed from horizontal axis or vertical axis. As the dimensions of data increases, the difficulty to visualize it and perform computations on it also increases. Variance: It is a measure of the variability or it simply measures how spread the data set is. Mathematically, it is the average squared deviation from the mean score.