Collaborating Authors


Non-Linear Spectral Dimensionality Reduction Under Uncertainty Artificial Intelligence

In this paper, we consider the problem of non-linear dimensionality reduction under uncertainty, both from a theoretical and algorithmic perspectives. Since real-world data usually contain measurements with uncertainties and artifacts, the input space in the proposed framework consists of probability distributions to model the uncertainties associated with each sample. We propose a new dimensionality reduction framework, called NGEU, which leverages uncertainty information and directly extends several traditional approaches, e.g., KPCA, MDA/KMFA, to receive as inputs the probability distributions instead of the original data. We show that the proposed NGEU formulation exhibits a global closed-form solution, and we analyze, based on the Rademacher complexity, how the underlying uncertainties theoretically affect the generalization ability of the framework. Empirical results on different datasets show the effectiveness of the proposed framework.

Dimensionality Reduction Meets Message Passing for Graph Node Embeddings Machine Learning

Graph Neural Networks (GNNs) have become a popular approach for various applications, ranging from social network analysis to modeling chemical properties of molecules. While GNNs often show remarkable performance on public datasets, they can struggle to learn long-range dependencies in the data due to over-smoothing and over-squashing tendencies. To alleviate this challenge, we propose PCAPass, a method which combines Principal Component Analysis (PCA) and message passing for generating node embeddings in an unsupervised manner and leverages gradient boosted decision trees for classification tasks. We show empirically that this approach provides competitive performance compared to popular GNNs on node classification benchmarks, while gathering information from longer distance neighborhoods. Our research demonstrates that applying dimensionality reduction with message passing and skip connections is a promising mechanism for aggregating long-range dependencies in graph structured data.

Dimensionality Reduction on Face using PCA


Machine Learning has a wide variety of dimensionality reduction techniques. It is one of the most important aspects in the Data Science field. As a result, in this article, I will present one of the most significant dimensionality reduction techniques used today, known as Principal Component Analysis (PCA). But first, we need to understand what Dimensionality Reduction is and why it is so crucial. Dimensionality reduction, also known as dimension reduction, is the transformation of data from a high-dimensional space to a low-dimensional space in such a way that the low-dimensional representation retains some meaningful properties of the original data, preferably close to its underlying dimension.

Interactive Dimensionality Reduction for Comparative Analysis Machine Learning

Finding the similarities and differences between groups of datasets is a fundamental analysis task. For high-dimensional data, dimensionality reduction (DR) methods are often used to find the characteristics of each group. However, existing DR methods provide limited capability and flexibility for such comparative analysis as each method is designed only for a narrow analysis target, such as identifying factors that most differentiate groups. This paper presents an interactive DR framework where we integrate our new DR method, called ULCA (unified linear comparative analysis), with an interactive visual interface. ULCA unifies two DR schemes, discriminant analysis and contrastive learning, to support various comparative analysis tasks. To provide flexibility for comparative analysis, we develop an optimization algorithm that enables analysts to interactively refine ULCA results. Additionally, the interactive visualization interface facilitates interpretation and refinement of the ULCA results. We evaluate ULCA and the optimization algorithm to show their efficiency as well as present multiple case studies using real-world datasets to demonstrate the usefulness of this framework.

Techniques for Dimensionality Reduction


In addition to this, the recent'Big Bang' in large datasets across companies, organisation, and government departments has resulted in a large uptake in data mining techniques. So, what is data mining? Simply put, it's the process of discovering trends and insights in high-dimensionality datasets (those with thousands of columns). On the one hand, the high-dimensionality datasets have enabled organisations to solve complex, real-world problems, such as reducing cancer patient waiting time, predicting protein structure associated with COVID-19, and analysing MEG brain imaging scans. However, on the other hand, large datasets can sometimes contain columns with poor-quality data, which can lower the performance of the model -- more isn't always better.

Laplacian-Based Dimensionality Reduction Including Spectral Clustering, Laplacian Eigenmap, Locality Preserving Projection, Graph Embedding, and Diffusion Map: Tutorial and Survey Machine Learning

This is a tutorial and survey paper for nonlinear dimensionality and feature extraction methods which are based on the Laplacian of graph of data. We first introduce adjacency matrix, definition of Laplacian matrix, and the interpretation of Laplacian. Then, we cover the cuts of graph and spectral clustering which applies clustering in a subspace of data. Different optimization variants of Laplacian eigenmap and its out-of-sample extension are explained. Thereafter, we introduce the locality preserving projection and its kernel variant as linear special cases of Laplacian eigenmap. Versions of graph embedding are then explained which are generalized versions of Laplacian eigenmap and locality preserving projection. Finally, diffusion map is introduced which is a method based on Laplacian of data and random walks on the data graph.

A Subspace-based Approach for Dimensionality Reduction and Important Variable Selection Machine Learning

An analysis of high dimensional data can offer a detailed description of a system but is often challenged by the curse of dimensionality. General dimensionality reduction techniques can alleviate such difficulty by extracting a few important features, but they are limited due to the lack of interpretability and connectivity to actual decision making associated with each physical variable. Important variable selection techniques, as an alternative, can maintain the interpretability, but they often involve a greedy search that is susceptible to failure in capturing important interactions. This research proposes a new method that produces subspaces, reduced-dimensional physical spaces, based on a randomized search and forms an ensemble of models for critical subspaces. When applied to high-dimensional data collected from a composite metal development process, the proposed method shows its superiority in prediction and important variable selection.

Divergence Regulated Encoder Network for Joint Dimensionality Reduction and Classification Artificial Intelligence

In this paper, we investigate performing joint dimensionality reduction and classification using a novel histogram neural network. Motivated by a popular dimensionality reduction approach, t-Distributed Stochastic Neighbor Embedding (t-SNE), our proposed method incorporates a classification loss computed on samples in a low-dimensional embedding space. We compare the learned sample embeddings against coordinates found by t-SNE in terms of classification accuracy and qualitative assessment. We also explore use of various divergence measures in the t-SNE objective. The proposed method has several advantages such as readily embedding out-of-sample points and reducing feature dimensionality while retaining class discriminability. Our results show that the proposed approach maintains and/or improves classification performance and reveals characteristics of features produced by neural networks that may be helpful for other applications.

The Dilemma Between Dimensionality Reduction and Adversarial Robustness Machine Learning

Recent work has shown the tremendous vulnerability to adversarial samples that are nearly indistinguishable from benign data but are improperly classified by the deep learning model. Some of the latest findings suggest the existence of adversarial attacks may be an inherent weakness of these models as a direct result of its sensitivity to well-generalizing features in high dimensional data. We hypothesize that data transformations can influence this vulnerability since a change in the data manifold directly determines the adversary's ability to create these adversarial samples. To approach this problem, we study the effect of dimensionality reduction through the lens of adversarial robustness. This study raises awareness of the positive and negative impacts of five commonly used data transformation techniques on adversarial robustness. The evaluation shows how these techniques contribute to an overall increased vulnerability where accuracy is only improved when the dimensionality reduction technique approaches the data's optimal intrinsic dimension. The conclusions drawn from this work contribute to understanding and creating more resistant learning models.

Multi-Criteria Dimensionality Reduction with Applications to Fairness

Neural Information Processing Systems

Dimensionality reduction is a classical technique widely used for data analysis. One foundational instantiation is Principal Component Analysis (PCA), which minimizes the average reconstruction error. In this paper, we introduce the multi-criteria dimensionality reduction problem where we are given multiple objectives that need to be optimized simultaneously. As an application, our model captures several fairness criteria for dimensionality reduction such as the Fair-PCA problem introduced by Samadi et al. [NeurIPS18] and the Nash Social Welfare (NSW) problem. In the Fair-PCA problem, the input data is divided into k groups, and the goal is to find a single d-dimensional representation for all groups for which the maximum reconstruction error of any one group is minimized.