Scalable Machine Learning on Spark
Here, we're observing the mean and variance of the features we have. This is helpful in determining if we need to perform normalization of features. It's useful to have all features on a similar scale. We are also taking a note of non-zero values, which can adversely impact model performance. Another important metric to analyze is the correlation between features in the input data - Matrix correlMatrix Statistics.corr(inputData.rdd(),
Oct-7-2020, 00:10:29 GMT
- Technology: