# Dimensionality Reduction

### Data Dimensionality Reduction in the Age of Machine Learning - DATAVERSITY

Click to learn more about author Rosaria Silipo. Machine Learning is all the rage as companies try to make sense of the mountains of data they are collecting. Data is everywhere and proliferating at unprecedented speed. But, more data is not always better. In fact, large amounts of data can not only considerably slow down the system execution but can sometimes even produce worse performances in Data Analytics applications.

### Visualizing MNIST: An Exploration of Dimensionality Reduction - colah's blog

At some fundamental level, no one understands machine learning. It isn't a matter of things being too complicated. Almost everything we do is fundamentally very simple. Unfortunately, an innate human handicap interferes with us understanding these simple things. Humans evolved to reason fluidly about two and three dimensions. With some effort, we may think in four dimensions. Machine learning often demands we work with thousands of dimensions – or tens of thousands, or millions! Even very simple things become hard to understand when you do them in very high numbers of dimensions. Reasoning directly about these high dimensional spaces is just short of hopeless.

### Deep Variational Sufficient Dimensionality Reduction

We consider the problem of sufficient dimensionality reduction (SDR), where the high-dimensional observation is transformed to a low-dimensional sub-space in which the information of the observations regarding the label variable is preserved. We propose DVSDR, a deep variational approach for sufficient dimensionality reduction. The deep structure in our model has a bottleneck that represent the low-dimensional embedding of the data. We explain the SDR problem using graphical models and use the framework of variational autoencoders to maximize the lower bound of the log-likelihood of the joint distribution of the observation and label. We show that such a maximization problem can be interpreted as solving the SDR problem. DVSDR can be easily adopted to semi-supervised learning setting. In our experiment we show that DVSDR performs competitively on classification tasks while being able to generate novel data samples.

### Optimal terminal dimensionality reduction in Euclidean space

Let $\varepsilon\in(0,1)$ and $X\subset\mathbb R^d$ be arbitrary with $|X|$ having size $n>1$. The Johnson-Lindenstrauss lemma states there exists $f:X\rightarrow\mathbb R^m$ with $m = O(\varepsilon^{-2}\log n)$ such that $$\forall x\in X\ \forall y\in X, \|x-y\|_2 \le \|f(x)-f(y)\|_2 \le (1+\varepsilon)\|x-y\|_2 .$$ We show that a strictly stronger version of this statement holds, answering one of the main open questions of [MMMR18]: "$\forall y\in X$" in the above statement may be replaced with "$\forall y\in\mathbb R^d$", so that $f$ not only preserves distances within $X$, but also distances to $X$ from the rest of space. Previously this stronger version was only known with the worse bound $m = O(\varepsilon^{-4}\log n)$. Our proof is via a tighter analysis of (a specific instantiation of) the embedding recipe of [MMMR18].

### Dimensionality Reduction For Dummies -- Part 1: Intuition

We need to see in order to believe. When you have a dataset with more than three dimensions, it becomes impossible to see what's going on with our eyes. But who said that these extra dimensions are really necessary? Isn't there a way to somehow reduce it to one, two, or three humanly dimensions? It turns out there is.

### Dimensionality Reduction : Does PCA really improve classification outcome?

I have come across a couple of resources about dimensionality reduction techniques. This topic is definitively one of the most interesting ones, and it is great to think that there are algorithms able to reduce the number of features by choosing the most important ones that still represent the entire dataset. One of the advantages pointed out by authors is that these algorithms can improve the results of a classification task. In this post, I am going to verify this statement using a Principal Component Analysis ( PCA) to try to improve the classification performance of a neural network over a dataset. Does PCA really improve classification outcome?

### Spectral feature scaling method for supervised dimensionality reduction

Spectral dimensionality reduction methods enable linear separations of complex data with high-dimensional features in a reduced space. However, these methods do not always give the desired results due to irregularities or uncertainties of the data. Thus, we consider aggressively modifying the scales of the features to obtain the desired classification. Using prior knowledge on the labels of partial samples to specify the Fiedler vector, we formulate an eigenvalue problem of a linear matrix pencil whose eigenvector has the feature scaling factors. The resulting factors can modify the features of entire samples to form clusters in the reduced space, according to the known labels. In this study, we propose new dimensionality reduction methods supervised using the feature scaling associated with the spectral clustering. Numerical experiments show that the proposed methods outperform well-established supervised methods for toy problems with more samples than features, and are more robust regarding clustering than existing methods. Also, the proposed methods outperform existing methods regarding classification for real-world problems with more features than samples of gene expression profiles of cancer diseases. Furthermore, the feature scaling tends to improve the clustering and classification accuracies of existing unsupervised methods, as the proportion of training data increases.

### Nonlinear Dimensionality Reduction for Discriminative Analytics of Multiple Datasets

Principal component analysis (PCA) is widely used for feature extraction and dimensionality reduction, with documented merits in diverse tasks involving high-dimensional data. Standard PCA copes with one dataset at a time, but it is challenged when it comes to analyzing multiple datasets jointly. In certain data science settings however, one is often interested in extracting the most discriminative information from one dataset of particular interest (a.k.a. target data) relative to the other(s) (a.k.a. background data). To this end, this paper puts forth a novel approach, termed discriminative (d) PCA, for such discriminative analytics of multiple datasets. Under certain conditions, dPCA is proved to be least-squares optimal in recovering the component vector unique to the target data relative to background data. To account for nonlinear data correlations, (linear) dPCA models for one or multiple background datasets are generalized through kernel-based learning. Interestingly, all dPCA variants admit an analytical solution obtainable with a single (generalized) eigenvalue decomposition. Finally, corroborating dimensionality reduction tests using both synthetic and real datasets are provided to validate the effectiveness of the proposed methods.

### Branching embedding: A heuristic dimensionality reduction algorithm based on hierarchical clustering

This paper proposes a new dimensionality reduction algorithm named branching embedding (BE). It converts a dendrogram to a two-dimensional scatter plot, and visualizes the inherent structures of the original high-dimensional data. Since the conversion part is not computationally demanding, the BE algorithm would be beneficial for the case where hierarchical clustering is already performed. Numerical experiments revealed that the outputs of the algorithm moderately preserve the original hierarchical structures.

### Randomized ICA and LDA Dimensionality Reduction Methods for Hyperspectral Image Classification

Dimensionality reduction is an important step in processing the hyperspectral images (HSI) to overcome the curse of dimensionality problem. Linear dimensionality reduction methods such as Independent component analysis (ICA) and Linear discriminant analysis (LDA) are commonly employed to reduce the dimensionality of HSI. These methods fail to capture non-linear dependency in the HSI data, as data lies in the nonlinear manifold. To handle this, nonlinear transformation techniques based on kernel methods were introduced for dimensionality reduction of HSI. However, the kernel methods involve cubic computational complexity while computing the kernel matrix, and thus its potential cannot be explored when the number of pixels (samples) are large. In literature a fewer number of pixels are randomly selected to partial to overcome this issue, however this sub-optimal strategy might neglect important information in the HSI. In this paper, we propose randomized solutions to the ICA and LDA dimensionality reduction methods using Random Fourier features, and we label them as RFFICA and RFFLDA. Our proposed method overcomes the scalability issue and to handle the non-linearities present in the data more efficiently. Experiments conducted with two real-world hyperspectral datasets demonstrates that our proposed randomized methods outperform the conventional kernel ICA and kernel LDA in terms overall, per-class accuracies and computational time.