AITopics | Dimensionality Reduction

Collaborating Authors

Dimensionality Reduction

Dimensionality reduction or dimension reduction is the process of reducing the number of random variables under consideration by obtaining a set of principal variables. It can be divided into feature selection (find a subset of the original variables) and feature extraction (transform the data in the high-dimensional space to a space of fewer dimensions). (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Ultralow-dimensionality reduction for identifying critical transitions by spatial-temporal PCA

Chen, Pei, Suo, Yaofang, Liu, Rui, Chen, Luonan

arXiv.org Machine LearningJan-21-2025

Discovering dominant patterns and exploring dynamic behaviors especially critical state transitions and tipping points in high-dimensional time-series data are challenging tasks in study of real-world complex systems, which demand interpretable data representations to facilitate comprehension of both spatial and temporal information within the original data space. Here, we proposed a general and analytical ultralow-dimensionality reduction method for dynamical systems named spatial-temporal principal component analysis (stPCA) to fully represent the dynamics of a high-dimensional time-series by only a single latent variable without distortion, which transforms high-dimensional spatial information into one-dimensional temporal information based on nonlinear delay-embedding theory. The dynamics of this single variable is analytically solved and theoretically preserves the temporal property of original high-dimensional time-series, thereby accurately and reliably identifying the tipping point before an upcoming critical transition. Its applications to real-world datasets such as individual-specific heterogeneous ICU records demonstrated the effectiveness of stPCA, which quantitatively and robustly provides the early-warning signals of the critical/tipping state on each patient.

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Machine Learning

2501.12582

Country:

North America > United States (0.28)
Asia > China (0.28)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Health Care Technology (1.00)
Health & Medicine > Health Care Providers & Services (1.00)
(3 more...)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

Add feedback

Reviews: Dimensionality Reduction of Massive Sparse Datasets Using Coresets

Neural Information Processing SystemsJan-20-2025, 18:43:22 GMT

This paper makes some pretty critical mistakes regarding previous work. For one, they cite [8], but they should in fact be citing Cohen et al. "Dimensionality Reduction for k-Means Clustering and Low Rank Approximation" This is not just a typo - the authors go on to state a result of [8] about operator norm rather than the result of the Cohen et al. paper - namely, the Cohen et al. paper achieves O(k/eps 2) rescaled columns deterministically for exactly the same problem considered in this submission - see part 5 of Lemma 11 and section 7.3 based on BSS. This is much stronger than the O(k 2/eps 2) rescaled columns achieved in the submission. This directly contradicts their sentence "Our main result is the first algorithm for computing an (k,eps)-coreset C of size independent of both n and d". The authors also say later [8,7] minimize the 2-norm - [8] is the wrong reference again!

coreset, dimensionality reduction, massive sparse dataset, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.63)

Add feedback

Reviews: Large Margin Discriminant Dimensionality Reduction in Prediction Space

Neural Information Processing SystemsJan-20-2025, 12:05:58 GMT

The authors modify the MCBoost criterion, in order to allow for multi-class boosting that is based on arbitrary number of dimensions (compared to a previous formulation that limits the number of dimensions to the number of classes). This lift of the limits in terms of dimensionality allows for a boosting-like framework that is comprised of controllable amount of boosting functions, and thus can be used as. The connection between MC-Boost and MV-SVM is interesting, and the discussion is good. Is the fact that both MC-SVM and MC-Boost try to maximise the margin well known? The authors present improved results in terms of error rate, and in terms of mAP.

ladder outperform mcboost, margin discriminant dimensionality reduction, prediction space, (4 more...)

Neural Information Processing Systems

Genre: Summary/Review (0.39)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.43)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.41)

Add feedback

Unified dimensionality reduction techniques in chronic liver disease detection

Karna, Anand, Khan, Naina, Rauniyar, Rahul, Shambharkar, Prashant Giridhar

arXiv.org Artificial IntelligenceDec-30-2024

Globally, chronic liver disease continues to be a major health concern that requires precise predictive models for prompt detection and treatment. Using the Indian Liver Patient Dataset (ILPD) from the University of California at Irvine's UCI Machine Learning Repository, a number of machine learning algorithms are investigated in this study. The main focus of our research is this dataset, which includes the medical records of 583 patients, 416 of whom have been diagnosed with liver disease and 167 of whom have not. There are several aspects to this work, including feature extraction and dimensionality reduction methods like Linear Discriminant Analysis (LDA), Factor Analysis (FA), t-distributed Stochastic Neighbour Embedding (t-SNE), and Uniform Manifold Approximation and Projection (UMAP). The purpose of the study is to investigate how well these approaches work for converting high-dimensional datasets and improving prediction accuracy. To assess the prediction ability of the improved models, a number of classification methods were used, such as Multi-layer Perceptron, Random Forest, K-nearest neighbours, and Logistic Regression. Remarkably, the improved models performed admirably, with Random Forest having the highest accuracy of 98.31\% in 10-fold cross-validation and 95.79\% in train-test split evaluation. Findings offer important new perspectives on the choice and use of customized feature extraction and dimensionality reduction methods, which improve predictive models for patients with chronic liver disease.

artificial intelligence, dataset, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2412.21156

Country: North America > United States > California (0.24)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Nephrology (1.00)
Health & Medicine > Therapeutic Area > Hepatology (1.00)
Health & Medicine > Therapeutic Area > Gastroenterology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.81)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)

Add feedback

Integrating Random Effects in Variational Autoencoders for Dimensionality Reduction of Correlated Data

Simchoni, Giora, Rosset, Saharon

arXiv.org Machine LearningDec-24-2024

Variational Autoencoders (VAE) are widely used for dimensionality reduction of large-scale tabular and image datasets, under the assumption of independence between data observations. In practice, however, datasets are often correlated, with typical sources of correlation including spatial, temporal and clustering structures. Inspired by the literature on linear mixed models (LMM), we propose LMMVAE -- a novel model which separates the classic VAE latent model into fixed and random parts. While the fixed part assumes the latent variables are independent as usual, the random part consists of latent variables which are correlated between similar clusters in the data such as nearby locations or successive measurements. The classic VAE architecture and loss are modified accordingly. LMMVAE is shown to improve squared reconstruction error and negative likelihood loss significantly on unseen data, with simulated as well as real datasets from various applications and correlation scenarios. It also shows improvement in the performance of downstream tasks such as supervised classification on the learned representations.

categorical feature, dataset, reconstruction error, (13 more...)

arXiv.org Machine Learning

2412.16899

Country:

Europe > United Kingdom (0.28)
Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
North America > United States > New York > New York County > New York City (0.04)
Asia > Middle East > Palestine (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.61)

Add feedback

Dimensionality Reduction Techniques for Global Bayesian Optimisation

Long, Luo, Cartis, Coralia, Shustin, Paz Fink

arXiv.org Artificial IntelligenceDec-12-2024

Bayesian Optimisation (BO) is a state-of-the-art global optimisation technique for black-box problems where derivative information is unavailable, and sample efficiency is crucial. However, improving the general scalability of BO has proved challenging. Here, we explore Latent Space Bayesian Optimisation (LSBO), that applies dimensionality reduction to perform BO in a reduced-dimensional subspace. While early LSBO methods used (linear) random projections (Wang et al., 2013), we employ Variational Autoencoders (VAEs) to manage more complex data structures and general DR tasks. Building on Grosnit et. al. (2021), we analyse the VAE-based LSBO framework, focusing on VAE retraining and deep metric loss. We suggest a few key corrections in their implementation, originally designed for tasks such as molecule generation, and reformulate the algorithm for broader optimisation purposes. Our numerical results show that structured latent manifolds improve BO performance. Additionally, we examine the use of the Mat\'{e}rn-$\frac{5}{2}$ kernel for Gaussian Processes in this LSBO context. We also integrate Sequential Domain Reduction (SDR), a standard global optimization efficiency strategy, into BO. SDR is included in a GPU-based environment using \textit{BoTorch}, both in the original and VAE-generated latent spaces, marking the first application of SDR within LSBO.

artificial intelligence, data mining, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2412.09183

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
Asia > Middle East > Jordan (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.61)

Add feedback

When Dimensionality Reduction Meets Graph (Drawing) Theory: Introducing a Common Framework, Challenges and Opportunities

Paulovich, Fernando, Arleo, Alessio, Elzen, Stef van den

arXiv.org Artificial IntelligenceDec-9-2024

In the vast landscape of visualization research, Dimensionality Reduction (DR) and graph analysis are two popular subfields, often essential to most visual data analytics setups. DR aims to create representations to support neighborhood and similarity analysis on complex, large datasets. Graph analysis focuses on identifying the salient topological properties and key actors within networked data, with specialized research on investigating how such features could be presented to the user to ease the comprehension of the underlying structure. Although these two disciplines are typically regarded as disjoint subfields, we argue that both fields share strong similarities and synergies that can potentially benefit both. Therefore, this paper discusses and introduces a unifying framework to help bridge the gap between DR and graph (drawing) theory. Our goal is to use the strongly math-grounded graph theory to improve the overall process of creating DR visual representations. We propose how to break the DR process into well-defined stages, discussing how to match some of the DR state-of-the-art techniques to this framework and presenting ideas on how graph drawing, topology features, and some popular algorithms and strategies used in graph analysis can be employed to improve DR topology extraction, embedding generation, and result validation. We also discuss the challenges and identify opportunities for implementing and using our framework, opening directions for future visualization research.

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2412.06555

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > Netherlands > South Holland > Leiden (0.04)
Europe > United Kingdom > Wales (0.04)
(4 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.62)

Add feedback

An Automated Data Mining Framework Using Autoencoders for Feature Extraction and Dimensionality Reduction

Liang, Yaxin, Li, Xinshi, Huang, Xin, Zhang, Ziqi, Yao, Yue

arXiv.org Artificial IntelligenceDec-3-2024

This study proposes an automated data mining framework based on autoencoders and experimentally verifies its effectiveness in feature extraction and data dimensionality reduction. Through the encoding-decoding structure, the autoencoder can capture the data's potential characteristics and achieve noise reduction and anomaly detection, providing an efficient and stable solution for the data mining process. The experiment compared the performance of the autoencoder with traditional dimensionality reduction methods (such as PCA, FA, T-SNE, and UMAP). The results showed that the autoencoder performed best in terms of reconstruction error and root mean square error and could better retain data structure and enhance the generalization ability of the model. The autoencoder-based framework not only reduces manual intervention but also significantly improves the automation of data processing. In the future, with the advancement of deep learning and big data technology, the autoencoder method combined with a generative adversarial network (GAN) or graph neural network (GNN) is expected to be more widely used in the fields of complex data processing, real-time data analysis and intelligent decision-making.

autoencoder, data mining, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2412.02211

Country:

North America > United States > Virginia (0.04)
North America > United States > New Jersey (0.04)
North America > United States > California > San Bernardino County > Montclair (0.04)
North America > United States > California > Los Angeles County > Los Angeles (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology (0.69)
Materials > Metals & Mining (0.35)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.84)

Add feedback

Navigating the Effect of Parametrization for Dimensionality Reduction

Huang, Haiyang, Wang, Yingfan, Rudin, Cynthia

arXiv.org Artificial IntelligenceNov-24-2024

Parametric dimensionality reduction methods have gained prominence for their ability to generalize to unseen datasets, an advantage that traditional approaches typically lack. Despite their growing popularity, there remains a prevalent misconception among practitioners about the equivalence in performance between parametric and non-parametric methods. Here, we show that these methods are not equivalent -- parametric methods retain global structure but lose significant local details. To explain this, we provide evidence that parameterized approaches lack the ability to repulse negative pairs, and the choice of loss function also has an impact. Addressing these issues, we developed a new parametric method, ParamRepulsor, that incorporates Hard Negative Mining and a loss function that applies a strong repulsive force. This new method achieves state-of-the-art performance on local structure preservation for parametric methods without sacrificing the fidelity of global structural representation. Our code is available at https://github.com/hyhuang00/ParamRepulsor.

algorithm, dataset, preservation, (15 more...)

arXiv.org Artificial Intelligence

2411.15894

Country:

North America > United States (0.28)
Europe > United Kingdom > England > Greater London > London (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine (0.68)
Information Technology (0.67)
Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.62)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback

DMT-HI: MOE-based Hyperbolic Interpretable Deep Manifold Transformation for Unspervised Dimensionality Reduction

Zang, Zelin, Wang, Yuhao, Wu, Jinlin, Liu, Hong, Shen, Yue, Li, Stan. Z, Lei, Zhen

arXiv.org Artificial IntelligenceOct-25-2024

Dimensionality reduction (DR) plays a crucial role in various fields, including data engineering and visualization, by simplifying complex datasets while retaining essential information. However, the challenge of balancing DR accuracy and interpretability remains crucial, particularly for users dealing with high-dimensional data. Traditional DR methods often face a trade-off between precision and transparency, where optimizing for performance can lead to reduced interpretability, and vice versa. This limitation is especially prominent in real-world applications such as image, tabular, and text data analysis, where both accuracy and interpretability are critical. To address these challenges, this work introduces the MOE-based Hyperbolic Interpretable Deep Manifold Transformation (DMT-HI). The proposed approach combines hyperbolic embeddings, which effectively capture complex hierarchical structures, with Mixture of Experts (MOE) models, which dynamically allocate tasks based on input features. DMT-HI enhances DR accuracy by leveraging hyperbolic embeddings to represent the hierarchical nature of data, while also improving interpretability by explicitly linking input data, embedding outcomes, and key features through the MOE structure. Extensive experiments demonstrate that DMT-HI consistently achieves superior performance in both DR accuracy and model interpretability, making it a robust solution for complex data analysis. The code is available at \url{https://github.com/zangzelin/code_dmthi}.

artificial intelligence, data mining, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2410.19504

Country:

Asia > China > Zhejiang Province > Hangzhou (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
Europe > Monaco (0.04)

Genre: Research Report > New Finding (0.67)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.62)

Add feedback