How to cross-validate PCA, clustering, and matrix decomposition models · Its Neuronal
TL;DR I cover how cross-validation is a somewhat tricky problem for matrix factorization models (including PCA & clustering as special cases) and provide some Python code snippets for fitting these models with held out data. Cross-validation is a fundamental paradigm in modern data analysis. However, it is largely applied to supervised settings, such as regression and classification. Here, the procedure is simple: fit your model on, say, 90% of the data (the training set), and evaluate its performance on the remaining 10% (the test set). However, this idea does not easily extend to other unsupervised methods, such as dimensionality reduction methods or clustering.
Mar-9-2018, 13:59:01 GMT
- Technology: