Many machine learning algorithms can perform worse if they deal with data that has an extremely large number of features (dimensions). This is particularly the case if many of those features are highly sparse. This is where dimension reduction can be useful. The idea is to project the high dimensional data into a lower dimension subspace, while retaining as much of the variance present in the data as possible. We will initially use two methods (PCA and t-SNE) to explore whether it is appropriate to use dimension reduction on our lyric data, as well as get an early indication of what a good range of dimensions to reduce into might be.
Jun-28-2022, 01:01:28 GMT