Nonlinear dimensionality reduction methods are a popular tool for data scientists and researchers to visualize complex, high dimensional data. However, while these methods continue to improve and grow in number, it is often difficult to evaluate the quality of a visualization due to a variety of factors such as lack of information about the intrinsic dimension of the data and additional tuning required for many evaluation metrics. In this paper, we seek to provide a systematic comparison of dimensionality reduction quality metrics using datasets where we know the ground truth manifold. We utilize each metric for hyperparameter optimization in popular dimensionality reduction methods used for visualization and provide quantitative metrics to objectively compare visualizations to their original manifold. In our results, we find a few methods that appear to consistently do well and propose the best performer as a benchmark for evaluating dimensionality reduction based visualizations.
Click to learn more about author Rosaria Silipo. Machine Learning is all the rage as companies try to make sense of the mountains of data they are collecting. Data is everywhere and proliferating at unprecedented speed. But, more data is not always better. In fact, large amounts of data can not only considerably slow down the system execution but can sometimes even produce worse performances in Data Analytics applications.
As an important machine learning topic, dimensionality reduction has been widely studied and utilized in various kinds of areas. A multitude of dimensionality reduction methods have been developed, among which unsupervised dimensionality reduction is more desirable when obtaining label information requires onerous work. However, most previous unsupervised dimensionality reduction methods call for an affinity graph constructed beforehand, with which the following dimensionality reduction steps can be then performed. Separation of graph construction and dimensionality reduction leads the dimensionality reduction process highly dependent on quality of the input graph. In this paper, we propose a novel graph embedding method for unsupervised dimensionality reduction.
Dimensionality reduction is an essential step in high-dimensional data analysis. Many dimensionality reduction algorithms have been applied successfully to multi-class and multi-label problems. They are commonly applied as a separate data preprocessing step before classification algorithms. In this paper, we study a joint learning framework in which we perform dimensionality reduction and multi-label classification simultaneously. We show that when the least squares loss is used in classification, this joint learning decouples into two separate components, i.e., dimensionality reduction followed by multi-label classification. This analysis partially justifies the current practice of a separate application of dimensionality reduction for classification problems. We extend our analysis using other loss functions, including the hinge loss and the squared hinge loss. We further extend the formulation to the more general case where the input data for different class labels may differ, overcoming the limitation of traditional dimensionality reduction algorithms. Experiments on benchmark data sets have been conducted to evaluate the proposed joint formulations.