AITopics | Dimensionality Reduction

Collaborating Authors

Dimensionality Reduction

Dimensionality reduction or dimension reduction is the process of reducing the number of random variables under consideration by obtaining a set of principal variables. It can be divided into feature selection (find a subset of the original variables) and feature extraction (transform the data in the high-dimensional space to a space of fewer dimensions). (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

CCP: Correlated Clustering and Projection for Dimensionality Reduction

Hozumi, Yuta, Wang, Rui, Wei, Guo-Wei

arXiv.org Machine LearningJun-8-2022

Most dimensionality reduction methods employ frequency domain representations obtained from matrix diagonalization and may not be efficient for large datasets with relatively high intrinsic dimensions. To address this challenge, Correlated Clustering and Projection (CCP) offers a novel data domain strategy that does not need to solve any matrix. CCP partitions high-dimensional features into correlated clusters and then projects correlated features in each cluster into a one-dimensional representation based on sample correlations. Residue-Similarity (R-S) scores and indexes, the shape of data in Riemannian manifolds, and algebraic topology-based persistent Laplacian are introduced for visualization and analysis. Proposed methods are validated with benchmark datasets associated with various machine learning algorithms.

artificial intelligence, correlated clustering and projection, machine learning, (2 more...)

arXiv.org Machine Learning

2206.04189

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.60)

Add feedback

Distribution Agnostic Symbolic Representations for Time Series Dimensionality Reduction and Online Anomaly Detection

Bountrogiannis, Konstantinos, Tzagkarakis, George, Tsakalides, Panagiotis

arXiv.org Artificial IntelligenceJun-6-2022

Due to the importance of the lower bounding distances and the attractiveness of symbolic representations, the family of symbolic aggregate approximations (SAX) has been used extensively for encoding time series data. However, typical SAX-based methods rely on two restrictive assumptions; the Gaussian distribution and equiprobable symbols. This paper proposes two novel data-driven SAX-based symbolic representations, distinguished by their discretization steps. The first representation, oriented for general data compaction and indexing scenarios, is based on the combination of kernel density estimation and Lloyd-Max quantization to minimize the information loss and mean squared error in the discretization step. The second method, oriented for high-level mining tasks, employs the Mean-Shift clustering method and is shown to enhance anomaly detection in the lower-dimensional space. Besides, we verify on a theoretical basis a previously observed phenomenon of the intrinsic process that results in a lower than the expected variance of the intermediate piecewise aggregate approximation. This phenomenon causes an additional information loss but can be avoided with a simple modification. The proposed representations possess all the attractive properties of the conventional SAX method. Furthermore, experimental evaluation on real-world datasets demonstrates their superiority compared to the traditional SAX and an alternative data-driven SAX variant.

dataset, representation, symbolic representation, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/TKDE.2022.3174630

2105.09592

Country:

North America > United States > New York > New York County > New York City (0.04)
Asia (0.04)

Genre:

Research Report (0.50)
Workflow (0.46)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.41)

Add feedback

Dimensionality Reduction: Principal Component Analysis

#artificialintelligenceJun-1-2022, 18:20:51 GMT

A dataset is made up of a number of features. As long as these features are related in someway to the target and are optimal in number a machine learning model will be able to produce decent results after learning from the data. But if the number of features are high and most of the features do not contribute towards the model's learning then the performance of the model will go down and the time taken to output predictions also increases. The process of reducing the number of dimensions by transforming the original feature space into a subspace is one method of performing dimensionality reduction and Principal Component Analysis (PCA) does this. So let's take a look into the building concepts of PCA.

eigen value, eigen vector, vector, (9 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.65)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Principal Component Analysis (0.63)

Add feedback

A New Dimensionality Reduction Method Based on Hensel's Compression for Privacy Protection in Federated Learning

Ouadrhiri, Ahmed El, Abdelhadi, Ahmed

arXiv.org Artificial IntelligenceMay-1-2022

Differential privacy (DP) is considered a de-facto standard for protecting users' privacy in data analysis, machine, and deep learning. Existing DP-based privacy-preserving training approaches consist of adding noise to the clients' gradients before sharing them with the server. However, implementing DP on the gradient is not efficient as the privacy leakage increases by increasing the synchronization training epochs due to the composition theorem. Recently researchers were able to recover images used in the training dataset using Generative Regression Neural Network (GRNN) even when the gradient was protected by DP. In this paper, we propose two layers of privacy protection approach to overcome the limitations of the existing DP-based approaches. The first layer reduces the dimension of the training dataset based on Hensel's Lemma. We are the first to use Hensel's Lemma for reducing the dimension (i.e., compress) of a dataset. The new dimensionality reduction method allows reducing the dimension of a dataset without losing information since Hensel's Lemma guarantees uniqueness. The second layer applies DP to the compressed dataset generated by the first layer. The proposed approach overcomes the problem of privacy leakage due to composition by applying DP only once before the training; clients train their local model on the privacy-preserving dataset generated by the second layer. Experimental results show that the proposed approach ensures strong privacy protection while achieving good accuracy. The new dimensionality reduction method achieves an accuracy of 97%, with only 25 % of the original data size.

artificial intelligence, dataset, machine learning, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/ICNC57223.2023.10074197

2205.02089

Country:

North America > United States > Texas > Harris County > Houston (0.14)
North America > United States > New York (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.83)

Add feedback

5 Papers to Read on Dimensionality Reduction Method in 2022

#artificialintelligenceApr-12-2022, 05:00:06 GMT

Abstract: Dimension reduction is an important tool for analyzing high-dimensional data. The predictor envelope is a method of dimension reduction for regression that assumes certain linear combinations of the predictors are immaterial to the regression. The method can result in substantial gains in estimation efficiency and prediction accuracy over traditional maximum likelihood and least squares estimates. While predictor envelopes have been developed and studied for independent data, no work has been done adapting predictor envelopes to spatial data. In this work, the predictor envelope is adapted to a popular spatial model to form the spatial predictor envelope (SPE).

dimensionality reduction method, expression data, predictor envelope, (12 more...)

#artificialintelligence

Genre: Instructional Material > Course Syllabus & Notes (0.53)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.80)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.45)

Add feedback

Incorporating Texture Information into Dimensionality Reduction for High-Dimensional Images

Vieth, Alexander, Vilanova, Anna, Lelieveldt, Boudewijn, Eisemann, Elmar, Höllt, Thomas

arXiv.org Artificial IntelligenceMar-2-2022

High-dimensional imaging is becoming increasingly relevant in many fields from astronomy and cultural heritage to systems biology. Visual exploration of such high-dimensional data is commonly facilitated by dimensionality reduction. Consequently, exploration of such data is Figure 1: Texture-aware dimensionality reduction. An image typically split into a step focusing on the attribute space followed by (a) with black and white pixels forms multiple textures. In this paper, distance-based dimensionality reduction produces one cluster of we present a method for incorporating spatial neighborhood information black and one cluster of white pixels (b), a texture-aware version into distance-based dimensionality reduction methods, such as should create clusters for the different textures (c). We achieve this by modifying the distance measure between high-dimensional attribute vectors associated with each pixel such that it takes the pixel's spatial neighborhood into account. Based on a classification The spatial configuration is, however, commonly of interest when of different methods for comparing image patches, we explore a analyzing high-dimensional image data. We compare these approaches from neighborhood information into account, in addition to highdimensional a theoretical and experimental point of view. Typical approaches to combine high-dimensional evaluation on synthetic data and two real-world use cases. They use the embedding as a colormap and perform segmentation on the re-colored image. High-dimensional data is commonly acquired and analyzed in various Decoupling the high-dimensional and spatial analysis in such a application domains, from systems biology [26] to insurance way has several downsides: Most importantly, boundaries between fraud detection [37]. Typically, high-dimensional data are tabular clusters in an embedding are often not well defined, and as such data with many columns (or attributes), corresponding to the dimensionality classification is ambiguous and has a level of arbitrariness.

data mining, machine learning, pixel, (12 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/PacificVis53943.2022.00010

2202.09179

Country:

Europe > Netherlands > South Holland > Delft (0.04)
Europe > Netherlands > South Holland > Leiden (0.04)
North America > United States (0.04)
(2 more...)

Genre:

Research Report (0.50)
Overview (0.46)

Industry: Health & Medicine > Therapeutic Area (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (1.00)

Add feedback

A Dimensionality Reduction Method for Finding Least Favorable Priors with a Focus on Bregman Divergence

Dytso, Alex, Goldenbaum, Mario, Poor, H. Vincent, Shamai, Shlomo

arXiv.org Machine LearningFeb-23-2022

A common way of characterizing minimax estimators in point estimation is by moving the problem into the Bayesian estimation domain and finding a least favorable prior distribution. The Bayesian estimator induced by a least favorable prior, under mild conditions, is then known to be minimax. However, finding least favorable distributions can be challenging due to inherent optimization over the space of probability distributions, which is infinite-dimensional. This paper develops a dimensionality reduction method that allows us to move the optimization to a finite-dimensional setting with an explicit bound on the dimension. The benefit of this dimensionality reduction is that it permits the use of popular algorithms such as projected gradient ascent to find least favorable priors. Throughout the paper, in order to make progress on the problem, we restrict ourselves to Bayesian risks induced by a relatively large class of loss functions, namely Bregman divergences.

extreme point, mass point, theorem 2, (13 more...)

arXiv.org Machine Learning

2202.11598

Country:

Asia > Middle East > Israel (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > New Jersey (0.04)
(3 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.81)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)

Add feedback

Non-Linear Spectral Dimensionality Reduction Under Uncertainty

Laakom, Firas, Raitoharju, Jenni, Passalis, Nikolaos, Iosifidis, Alexandros, Gabbouj, Moncef

arXiv.org Artificial IntelligenceFeb-9-2022

In this paper, we consider the problem of non-linear dimensionality reduction under uncertainty, both from a theoretical and algorithmic perspectives. Since real-world data usually contain measurements with uncertainties and artifacts, the input space in the proposed framework consists of probability distributions to model the uncertainties associated with each sample. We propose a new dimensionality reduction framework, called NGEU, which leverages uncertainty information and directly extends several traditional approaches, e.g., KPCA, MDA/KMFA, to receive as inputs the probability distributions instead of the original data. We show that the proposed NGEU formulation exhibits a global closed-form solution, and we analyze, based on the Rademacher complexity, how the underlying uncertainties theoretically affect the generalization ability of the framework. Empirical results on different datasets show the effectiveness of the proposed framework.

dataset, ngeu, rademacher complexity, (12 more...)

arXiv.org Artificial Intelligence

2202.04678

Country:

Europe > Finland > Pirkanmaa > Tampere (0.04)
North America > United States > Louisiana > East Baton Rouge Parish > Baton Rouge (0.04)
Europe > Greece > Central Macedonia > Thessaloniki (0.04)
(2 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.82)

Add feedback

Dimensionality Reduction Meets Message Passing for Graph Node Embeddings

Sadowski, Krzysztof, Szarmach, Michał, Mattia, Eddie

arXiv.org Machine LearningFeb-2-2022

Graph Neural Networks (GNNs) have become a popular approach for various applications, ranging from social network analysis to modeling chemical properties of molecules. While GNNs often show remarkable performance on public datasets, they can struggle to learn long-range dependencies in the data due to over-smoothing and over-squashing tendencies. To alleviate this challenge, we propose PCAPass, a method which combines Principal Component Analysis (PCA) and message passing for generating node embeddings in an unsupervised manner and leverages gradient boosted decision trees for classification tasks. We show empirically that this approach provides competitive performance compared to popular GNNs on node classification benchmarks, while gathering information from longer distance neighborhoods. Our research demonstrates that applying dimensionality reduction with message passing and skip connections is a promising mechanism for aggregating long-range dependencies in graph structured data.

dimensionality reduction meet message passing, information, node, (13 more...)

arXiv.org Machine Learning

2202.00408

Genre: Research Report > New Finding (0.46)

Industry: Information Technology (0.49)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.63)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Why you should be using PHATE for dimensionality reduction

#artificialintelligenceJan-28-2022, 22:50:15 GMT

As data scientists, we often work with high-dimensional data with more than 3 features, or dimensions, of interest. In supervised machine learning, we may use this data for training and classification for example and may reduce the dimensions to speed up the training. In unsupervised learning, we use this type of data for visualization and clustering. In single-cell RNA sequencing (scRNA-seq), for example, we accumulate measurements of tens of thousands of genes per cell for upwards of a million cells. That's a lot of data that provides a window into the cell's identity, state, and other properties.

dataset, phate, probability, (14 more...)

#artificialintelligence

Industry: Health & Medicine (0.51)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.40)

Add feedback