Goto

Collaborating Authors

 Dimensionality Reduction


The Effects of Spectral Dimensionality Reduction on Hyperspectral Pixel Classification: A Case Study

arXiv.org Artificial Intelligence

This paper presents a systematic study of the effects of hyperspectral pixel dimensionality reduction on the pixel classification task. We use five dimensionality reduction methods -- PCA, KPCA, ICA, AE, and DAE -- to compress 301-dimensional hyperspectral pixels. Compressed pixels are subsequently used to perform pixel classifications. Pixel classification accuracies together with compression method, compression rates, and reconstruction errors provide a new lens to study the suitability of a compression method for the task of pixel classification. We use three high-resolution hyperspectral image datasets, representing three common landscape types (i.e. urban, transitional suburban, and forests) collected by the Remote Sensing and Spatial Ecosystem Modeling laboratory of the University of Toronto. We found that PCA, KPCA, and ICA post greater signal reconstruction capability; however, when compression rates are more than 90\% these methods show lower classification scores. AE and DAE methods post better classification accuracy at 95\% compression rate, however their performance drops as compression rate approaches 97\%. Our results suggest that both the compression method and the compression rate are important considerations when designing a hyperspectral pixel classification pipeline.


SLISEMAP: Explainable Dimensionality Reduction

arXiv.org Artificial Intelligence

Existing explanation methods for black-box supervised learning models generally work by building local models that explain the models behaviour for a particular data item. It is possible to make global explanations, but the explanations may have low fidelity for complex models. Most of the prior work on explainable models has been focused on classification problems, with less attention on regression. We propose a new manifold visualization method, SLISEMAP, that at the same time finds local explanations for all of the data items and builds a two-dimensional visualization of model space such that the data items explained by the same model are projected nearby. We provide an open source implementation of our methods, implemented by using GPU-optimized PyTorch library. SLISEMAP works both on classification and regression models. We compare SLISEMAP to most popular dimensionality reduction methods and some local explanation methods. We provide mathematical derivation of our problem and show that SLISEMAP provides fast and stable visualizations that can be used to explain and understand black box regression and classification models.


Scalable semi-supervised dimensionality reduction with GPU-accelerated EmbedSOM

arXiv.org Machine Learning

Abstract: Dimensionality reduction methods have found vast application as visualization tools in diverse areas of science. Although many different methods exist, their performance is often insufficient for providing quick insight into many contemporary datasets, and the unsupervised mode of use prevents the users from utilizing the methods for dataset exploration and finetuning the details for improved visualization quality. BlosSOM builds on a GPUaccelerated implementation of the EmbedSOM algorithm, complemented by several landmarkbased algorithms for interfacing the unsupervised model learning algorithms with the user supervision. We show the application of BlosSOM on realistic datasets, where it helps to produce high-quality visualizations that incorporate user-specified layout and focus on certain features. We believe the semi-supervised dimensionality reduction will improve the data visualization possibilities for science areas such as single-cell cytometry, and provide a fast and efficient base methodology for new directions in dataset exploration and annotation. Dimensionality reduction algorithms emerged as indispensable utilities that enable various forms of intuitive data visualization, providing insight that in turn simplifies rigorous data analysis. Various algorithms have been proposed for graphs and high-dimensional point-cloud data, and many different types of datasets that can be represented with a graph structure or embedded into vector spaces. Performance of the non-linear dimensionality reduction algorithms becomes a concern if the analysis pipeline is required to scale or when the results are required in a limited amount of time such as in clinical settings. The most popular methods, typically based on neighborhood embedding computed by stochastic descent, force-based layouting or neural autoencoders, reach applicability limits when the dataset size is too large. To tackle the limitations, we have previously developed EmbedSOM [15], a dimensionality reduction and visualization algorithm based on self-organizing maps (SOMs) [13]. EmbedSOM provided an order-of-magnitude speedup on datasets typical for the single-cell cytometry data visualization while retaining competitive quality of the results. The concept has proven useful for interactive and high-performance workflows in cytometry [16, 14], and easily applies to many other types of datasets.


Funnels: Exact maximum likelihood with dimensionality reduction

arXiv.org Machine Learning

Normalizing flows are diffeomorphic, typically dimension-preserving, models trained using the likelihood of the model. We use the SurVAE framework to construct dimension reducing surjective flows via a new layer, known as the funnel. We demonstrate its efficacy on a variety of datasets, and show it improves upon or matches the performance of existing flows while having a reduced latent space size. The funnel layer can be constructed from a wide range of transformations including restricted convolution and feed forward layers.


Nested Hyperbolic Spaces for Dimensionality Reduction and Hyperbolic NN Design

arXiv.org Artificial Intelligence

Hyperbolic neural networks have been popular in the recent past due to their ability to represent hierarchical data sets effectively and efficiently. The challenge in developing these networks lies in the nonlinearity of the embedding space namely, the Hyperbolic space. Hyperbolic space is a homogeneous Riemannian manifold of the Lorentz group. Most existing methods (with some exceptions) use local linearization to define a variety of operations paralleling those used in traditional deep neural networks in Euclidean spaces. In this paper, we present a novel fully hyperbolic neural network which uses the concept of projections (embeddings) followed by an intrinsic aggregation and a nonlinearity all within the hyperbolic space. The novelty here lies in the projection which is designed to project data on to a lower-dimensional embedded hyperbolic space and hence leads to a nested hyperbolic space representation independently useful for dimensionality reduction. The main theoretical contribution is that the proposed embedding is proved to be isometric and equivariant under the Lorentz transformations. This projection is computationally efficient since it can be expressed by simple linear operations, and, due to the aforementioned equivariance property, it allows for weight sharing. The nested hyperbolic space representation is the core component of our network and therefore, we first compare this ensuing nested hyperbolic space representation with other dimensionality reduction methods such as tangent PCA, principal geodesic analysis (PGA) and HoroPCA. Based on this equivariant embedding, we develop a novel fully hyperbolic graph convolutional neural network architecture to learn the parameters of the projection. Finally, we present experiments demonstrating comparative performance of our network on several publicly available data sets.


Dimensionality Reduction of Longitudinal 'Omics Data using Modern Tensor Factorization

arXiv.org Artificial Intelligence

Precision medicine is a clinical approach for disease prevention, detection and treatment, which considers each individual's genetic background, environment and lifestyle. The development of this tailored avenue has been driven by the increased availability of omics methods, large cohorts of temporal samples, and their integration with clinical data. Despite the immense progression, existing computational methods for data analysis fail to provide appropriate solutions for this complex, high-dimensional and longitudinal data. In this work we have developed a new method termed TCAM, a dimensionality reduction technique for multi-way data, that overcomes major limitations when doing trajectory analysis of longitudinal omics data. Using real-world data, we show that TCAM outperforms traditional methods, as well as state-of-the-art tensor-based approaches for longitudinal microbiome data analysis. Moreover, we demonstrate the versatility of TCAM by applying it to several different omics datasets, and the applicability of it as a drop-in replacement within straightforward ML tasks.


Dimensionality Reduction for Machine Learning - neptune.ai

#artificialintelligence

Data forms the foundation of any machine learning algorithm, without it, Data Science can not happen. Sometimes, it can contain a huge number of features, some of which are not even required. Such redundant information makes modeling complicated. Furthermore, interpreting and understanding the data by visualization gets difficult because of the high dimensionality. This is where dimensionality reduction comes into play. Dimensionality reduction is the task of reducing the number of features in a dataset. In machine learning tasks like regression or classification, there are often too many variables to work with. These variables are also called features.


t-SNE Machine Learning Algorithm -- A Great Tool for Dimensionality Reduction in Python

#artificialintelligence

A successful data scientist understands a wide range of Machine Learning algorithms and can explain the results to stakeholders. But, unfortunately, not every stakeholder has a sufficient amount of training to grasp the complexities of ML. Luckily, we can aid our explanations by using dimensionality reduction techniques to create visual representations of high dimensional data. This article will take you through one such technique called t-Distributed Stochastic Neighbor Embedding (t-SNE). Perfect categorization of Machine Learning techniques is not always possible due to the flexibility demonstrated by specific algorithms, making them useful when solving different problems (e.g., one can use k-NN for regression and classification).


Dimensionality Reduction on Face using PCA

#artificialintelligence

Machine Learning has a wide variety of dimensionality reduction techniques. It is one of the most important aspects in the Data Science field. As a result, in this article, I will present one of the most significant dimensionality reduction techniques used today, known as Principal Component Analysis (PCA). But first, we need to understand what Dimensionality Reduction is and why it is so crucial. Dimensionality reduction, also known as dimension reduction, is the transformation of data from a high-dimensional space to a low-dimensional space in such a way that the low-dimensional representation retains some meaningful properties of the original data, preferably close to its underlying dimension.


TLDR: Twin Learning for Dimensionality Reduction

arXiv.org Artificial Intelligence

Dimensionality reduction methods are unsupervised approaches which learn low-dimensional spaces where some properties of the initial space, typically the notion of "neighborhood", are preserved. They are a crucial component of diverse tasks like visualization, compression, indexing, and retrieval. Aiming for a totally different goal, self-supervised visual representation learning has been shown to produce transferable representation functions by learning models that encode invariance to artificially created distortions, e.g. a set of hand-crafted image transformations. Unlike manifold learning methods that usually require propagation on large k-NN graphs or complicated optimization solvers, self-supervised learning approaches rely on simpler and more scalable frameworks for learning. In this paper, we unify these two families of approaches from the angle of manifold learning and propose TLDR, a dimensionality reduction method for generic input spaces that is porting the simple self-supervised learning framework of Barlow Twins to a setting where it is hard or impossible to define an appropriate set of distortions by hand. We propose to use nearest neighbors to build pairs from a training set and a redundancy reduction loss borrowed from the self-supervised literature to learn an encoder that produces representations invariant across such pairs. TLDR is a method that is simple, easy to implement and train, and of broad applicability; it consists of an offline nearest neighbor computation step that can be highly approximated, and a straightforward learning process that does not require mining negative samples to contrast, eigendecompositions, or cumbersome optimization solvers. By replacing PCA with TLDR, we are able to increase the performance of GeM-AP by 4% mAP for 128 dimensions, and to retain its performance with 16x fewer dimensions.