Training a Machine Learning Model on a Dataset with Highly-Correlated Features
In a previous article, we've shown that a covariance matrix plot can be used for feature selection and dimensionality reduction: Feature Selection and Dimensionality Reduction Using Covariance Matrix Plot. We, therefore, were able to reduce the dimension of our feature space from 6 to 4. Now suppose we want to build a model on the new feature space for predicting the crew variable. Looking at the covariance matrix plot between features, we see that there is a strong correlation between the features (predictor variables), see the image above. In this article, we shall use a technique called Principal Component Analysis (PCA) to transform our features into space where the features are independent or uncorrelated. We shall then train our model on the PCA space.
Oct-22-2019, 00:24:44 GMT