AITopics | Dimensionality Reduction

Collaborating Authors

Dimensionality Reduction

Dimensionality reduction or dimension reduction is the process of reducing the number of random variables under consideration by obtaining a set of principal variables. It can be divided into feature selection (find a subset of the original variables) and feature extraction (transform the data in the high-dimensional space to a space of fewer dimensions). (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Towards One Model for Classical Dimensionality Reduction: A Probabilistic Perspective on UMAP and t-SNE

Ravuri, Aditya, Lawrence, Neil D.

arXiv.org Machine LearningMay-27-2024

This paper shows that the dimensionality reduction methods, UMAP and t-SNE, can be approximately recast as MAP inference methods corresponding to a generalized Wishart-based model introduced in ProbDR. This interpretation offers deeper theoretical insights into these algorithms, while introducing tools with which similar dimensionality reduction methods can be studied.

algorithm, equation, probability, (13 more...)

arXiv.org Machine Learning

2405.17412

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.81)

Add feedback

Gradient Boosting Mapping for Dimensionality Reduction and Feature Extraction

Patron, Anri, Prasad, Ayush, Luu, Hoang Phuc Hau, Puolamäki, Kai

arXiv.org Artificial IntelligenceMay-14-2024

A fundamental problem in supervised learning is to find a good set of features or distance measures. If the new set of features is of lower dimensionality and can be obtained by a simple transformation of the original data, they can make the model understandable, reduce overfitting, and even help to detect distribution drift. We propose a supervised dimensionality reduction method Gradient Boosting Mapping (GBMAP), where the outputs of weak learners -- defined as one-layer perceptrons -- define the embedding. We show that the embedding coordinates provide better features for the supervised learning task, making simple linear models competitive with the state-of-the-art regressors and classifiers. We also use the embedding to find a principled distance measure between points. The features and distance measures automatically ignore directions irrelevant to the supervised learning task. We also show that we can reliably detect out-of-distribution data points with potentially large regression or classification errors. GBMAP is fast and works in seconds for dataset of million data points or hundreds of features. As a bonus, GBMAP provides a regression and classification performance comparable to the state-of-the-art supervised learning methods.

artificial intelligence, inductive learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2405.08486

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Finland > Uusimaa > Helsinki (0.04)
North America > United States > New York > New York County > New York City (0.04)
(3 more...)

Genre: Research Report (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.61)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.48)

Add feedback

Nonnegative Matrix Factorization in Dimensionality Reduction: A Survey

Saberi-Movahed, Farid, Berahman, Kamal, Sheikhpour, Razieh, Li, Yuefeng, Pan, Shirui

arXiv.org Artificial IntelligenceMay-6-2024

Dimensionality Reduction plays a pivotal role in improving feature learning accuracy and reducing training time by eliminating redundant features, noise, and irrelevant data. Nonnegative Matrix Factorization (NMF) has emerged as a popular and powerful method for dimensionality reduction. Despite its extensive use, there remains a need for a comprehensive analysis of NMF in the context of dimensionality reduction. To address this gap, this paper presents a comprehensive survey of NMF, focusing on its applications in both feature extraction and feature selection. We introduce a classification of dimensionality reduction, enhancing understanding of the underlying concepts. Subsequently, we delve into a thorough summary of diverse NMF approaches used for feature extraction and selection. Furthermore, we discuss the latest research trends and potential future directions of NMF in dimensionality reduction, aiming to highlight areas that need further exploration and development.

artificial intelligence, data mining, machine learning, (11 more...)

arXiv.org Artificial Intelligence

2405.03615

Country:

Oceania > Australia > Queensland > Brisbane (0.04)
Europe > Finland > North Karelia > Joensuu (0.04)
Asia > Middle East > Iran > Kerman Province > Kerman (0.04)
(6 more...)

Genre: Overview (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.92)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.93)

Add feedback

Utilizing Machine Learning and 3D Neuroimaging to Predict Hearing Loss: A Comparative Analysis of Dimensionality Reduction and Regression Techniques

Pittala, Trinath Sai Subhash Reddy, Meleti, Uma Maheswara R, Thatipamula, Manasa

arXiv.org Artificial IntelligenceMay-1-2024

In this project, we have explored machine learning approaches for predicting hearing loss thresholds on the brain's gray matter 3D images. We have solved the problem statement in two phases. In the first phase, we used a 3D CNN model to reduce high-dimensional input into latent space and decode it into an original image to represent the input in rich feature space. In the second phase, we utilized this model to reduce input into rich features and used these features to train standard machine learning models for predicting hearing thresholds. We have experimented with autoencoders and variational autoencoders in the first phase for dimensionality reduction and explored random forest, XGBoost and multi-layer perceptron for regressing the thresholds. We split the given data set into training and testing sets and achieved an 8.80 range and 22.57 range for PT500 and PT4000 on the test set, respectively. We got the lowest RMSE using multi-layer perceptron among the other models. Our approach leverages the unique capabilities of VAEs to capture complex, non-linear relationships within high-dimensional neuroimaging data. We rigorously evaluated the models using various metrics, focusing on the root mean squared error (RMSE). The results highlight the efficacy of the multi-layer neural network model, which outperformed other techniques in terms of accuracy. This project advances the application of data mining in medical diagnostics and enhances our understanding of age-related hearing loss through innovative machine-learning frameworks.

autoencoder, dimensionality reduction, hearing threshold, (12 more...)

arXiv.org Artificial Intelligence

2405.00142

Genre: Research Report > New Finding (0.47)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Health Care Technology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.61)

Add feedback

CBMAP: Clustering-based manifold approximation and projection for dimensionality reduction

Dogan, Berat

arXiv.org Artificial IntelligenceApr-27-2024

Dimensionality reduction methods are employed to decrease data dimensionality, either to enhance machine learning performance or to facilitate data visualization in two or three-dimensional spaces. These methods typically fall into two categories: feature selection and feature transformation. Feature selection retains significant features, while feature transformation projects data into a lower-dimensional space, with linear and nonlinear methods. While nonlinear methods excel in preserving local structures and capturing nonlinear relationships, they may struggle with interpreting global structures and can be computationally intensive. Recent algorithms, such as the t-SNE, UMAP, TriMap, and PaCMAP prioritize preserving local structures, often at the expense of accurately representing global structures, leading to clusters being spread out more in lower-dimensional spaces. Moreover, these methods heavily rely on hyperparameters, making their results sensitive to parameter settings. To address these limitations, this study introduces a clustering-based approach, namely CBMAP (Clustering-Based Manifold Approximation and Projection), for dimensionality reduction. CBMAP aims to preserve both global and local structures, ensuring that clusters in lower-dimensional spaces closely resemble those in high-dimensional spaces. Experimental evaluations on benchmark datasets demonstrate CBMAP's efficacy, offering speed, scalability, and minimal reliance on hyperparameters. Importantly, CBMAP enables low-dimensional projection of test data, addressing a critical need in machine learning applications. CBMAP is made freely available at https://github.com/doganlab/cbmap and can be installed from the Python Package Directory (PyPI) software repository with the command pip install cbmap.

algorithm, cluster center, dataset, (13 more...)

arXiv.org Artificial Intelligence

2404.1794

Country:

North America > United States > California > Ventura County > Thousand Oaks (0.04)
Asia > Middle East > Republic of Türkiye > Malatya Province > Malatya (0.04)

Genre: Research Report (0.50)

Industry: Health & Medicine (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.84)

Add feedback

Formation-Controlled Dimensionality Reduction

Jeong, Taeuk, Jung, Yoon Mo

arXiv.org Artificial IntelligenceApr-10-2024

Dimensionality reduction represents the process of extracting low dimensional structure from high dimensional data. High dimensional data include multimedia databases, gene expression microarrays, and financial time series, for example. In order to deal with such real-world data properly, it is better to reduce its dimensionality to avoid undesired properties of high dimensions such as the curse of dimensionality [14, 11]. As a result, classification, visualization, and compression of data can be expedited, for example [14]. In many problems, it is presumed that the dimensionality of the measured data is only artificially high; the measured data are high-dimensional but data nearly have a lower-dimensional structure, since they are multiple, indirect measurements of an underlying factors, which typically cannot be directly calibrated [4].

dataset, formation control, neighbor point, (13 more...)

arXiv.org Artificial Intelligence

2404.06808

Country:

Oceania > Australia > Australian Capital Territory > Canberra (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre:

Research Report (0.50)
Overview (0.46)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.68)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.64)

Add feedback

Dimensionality Reduction in Sentence Transformer Vector Databases with Fast Fourier Transform

Bulgakov, Vitaly, Segal, Alec

arXiv.org Artificial IntelligenceApr-9-2024

Dimensionality reduction in vector databases is pivotal for streamlining AI data management, enabling efficient storage, faster computation, and improved model performance. This paper explores the benefits of reducing vector database dimensions, with a focus on computational efficiency and overcoming the curse of dimensionality. We introduce a novel application of Fast Fourier Transform (FFT) to dimensionality reduction, a method previously underexploited in this context. By demonstrating its utility across various AI domains, including Retrieval-Augmented Generation (RAG) models and image processing, this FFT-based approach promises to improve data retrieval processes and enhance the efficiency and scalability of AI solutions. The incorporation of FFT may not only optimize operations in real-time processing and recommendation systems but also extend to advanced image processing techniques, where dimensionality reduction can significantly improve performance and analysis efficiency. This paper advocates for the broader adoption of FFT in vector database management, marking a significant stride towards addressing the challenges of data volume and complexity in AI research and applications. Unlike many existing approaches, we directly handle the embedding vectors produced by the model after processing a test input.

data quality, fourier transform, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2404.06278

Country: North America > United States (0.31)

Genre: Research Report (0.84)

Industry:

Consumer Products & Services > Food, Beverage, Tobacco & Cannabis > Beverages (0.71)
Materials > Chemicals > Commodity Chemicals > Petrochemicals (0.46)
Energy > Oil & Gas > Midstream (0.46)

Technology:

Information Technology > Data Science > Data Quality > Data Transformation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Implementation of the Principal Component Analysis onto High-Performance Computer Facilities for Hyperspectral Dimensionality Reduction: Results and Comparisons

Martel, E., Lazcano, R., Lopez, J., Madroñal, D., Salvador, R., Lopez, S., Juarez, E., Guerra, R., Sanz, C., Sarmiento, R.

arXiv.org Artificial IntelligenceMar-27-2024

Dimensionality reduction represents a critical preprocessing step in order to increase the efficiency and the performance of many hyperspectral imaging algorithms. However, dimensionality reduction algorithms, such as the Principal Component Analysis (PCA), suffer from their computationally demanding nature, becoming advisable for their implementation onto high-performance computer architectures for applications under strict latency constraints. This work presents the implementation of the PCA algorithm onto two different high-performance devices, namely, an NVIDIA Graphics Processing Unit (GPU) and a Kalray manycore, uncovering a highly valuable set of tips and tricks in order to take full advantage of the inherent parallelism of these high-performance computing platforms, and hence, reducing the time that is required to process a given hyperspectral image. Moreover, the achieved results obtained with different hyperspectral images have been compared with the ones that were obtained with a field programmable gate array (FPGA)-based implementation of the PCA algorithm that has been recently published, providing, for the first time in the literature, a comprehensive analysis in order to highlight the pros and cons of each option.

algorithm, implementation, matrix, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.3390/rs10060864

2403.18321

Country:

Europe > Spain > Canary Islands > Gran Canaria > Las Palmas de Gran Canaria (0.14)
North America > United States > New Jersey > Hudson County > Hoboken (0.04)
Europe > Spain > Galicia > Madrid (0.04)
(13 more...)

Genre:

Workflow (0.87)
Research Report (0.63)

Industry:

Information Technology (0.89)
Government > Regional Government (0.67)

Technology:

Information Technology > Hardware (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.81)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Principal Component Analysis (0.61)

Add feedback

S+t-SNE -- Bringing dimensionality reduction to data streams

Vieira, Pedro C., Montrezol, João P., Vieira, João T., Gama, João

arXiv.org Artificial IntelligenceMar-26-2024

We present S+t-SNE, an adaptation of the t-SNE algorithm designed to handle infinite data streams. The core idea behind S+t-SNE is to update the t-SNE embedding incrementally as new data arrives, ensuring scalability and adaptability to handle streaming scenarios. By selecting the most important points at each step, the algorithm ensures scalability while keeping informative visualisations. Employing a blind method for drift management adjusts the embedding space, facilitating continuous visualisation of evolving data dynamics. Our experimental evaluations demonstrate the effectiveness and efficiency of S+t-SNE. The results highlight its ability to capture patterns in a streaming scenario. We hope our approach offers researchers and practitioners a real-time tool for understanding and interpreting high-dimensional data.

dataset, dimensionality reduction, projection, (13 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/978-3-031-58553-1_8

2403.17643

Country: Europe > Portugal > Coimbra > Coimbra (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.43)

Add feedback

Evaluating Unsupervised Dimensionality Reduction Methods for Pretrained Sentence Embeddings

Zhang, Gaifan, Zhou, Yi, Bollegala, Danushka

arXiv.org Artificial IntelligenceMar-20-2024

Sentence embeddings produced by Pretrained Language Models (PLMs) have received wide attention from the NLP community due to their superior performance when representing texts in numerous downstream applications. However, the high dimensionality of the sentence embeddings produced by PLMs is problematic when representing large numbers of sentences in memory- or compute-constrained devices. As a solution, we evaluate unsupervised dimensionality reduction methods to reduce the dimensionality of sentence embeddings produced by PLMs. Our experimental results show that simple methods such as Principal Component Analysis (PCA) can reduce the dimensionality of sentence embeddings by almost $50\%$, without incurring a significant loss in performance in multiple downstream tasks. Surprisingly, reducing the dimensionality further improves performance over the original high-dimensional versions for the sentence embeddings produced by some PLMs in some tasks.

computational linguistic, dimensionality, proceedings, (12 more...)

arXiv.org Artificial Intelligence

2403.14001

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Italy > Tuscany > Florence (0.04)
North America > United States > New York > New York County > New York City (0.04)
(10 more...)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.65)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback