Learning in High Dimensional Spaces
Structure-Preserving Nonlinear Sufficient Dimension Reduction for Tensors
Lin, Dianjun, Li, Bing, Xue, Lingzhou
We introduce two nonlinear sufficient dimension reduction methods for regressions with tensor-valued predictors. Our goal is two-fold: the first is to preserve the tensor structure when performing dimension reduction, particularly the meaning of the tensor modes, for improved interpretation; the second is to substantially reduce the number of parameters in dimension reduction, thereby achieving model parsimony and enhancing estimation accuracy. Our two tensor dimension reduction methods echo the two commonly used tensor decomposition mechanisms: one is the Tucker decomposition, which reduces a larger tensor to a smaller one; the other is the CP-decomposition, which represents an arbitrary tensor as a sequence of rank-one tensors. We developed the Fisher consistency of our methods at the population level and established their consistency and convergence rates. Both methods are easy to implement numerically: the Tucker-form can be implemented through a sequence of least-squares steps, and the CP-form can be implemented through a sequence of singular value decompositions. We investigated the finite-sample performance of our methods and showed substantial improvement in accuracy over existing methods in simulations and two data applications.
- Africa > Senegal > Kolda Region > Kolda (0.04)
- North America > United States > Pennsylvania (0.04)
- Health & Medicine > Health Care Technology (0.67)
- Health & Medicine > Therapeutic Area (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning in High Dimensional Spaces (1.00)
On Conditional Stochastic Interpolation for Generative Nonlinear Sufficient Dimension Reduction
Xu, Shuntuo, Yu, Zhou, Huang, Jian
Identifying low-dimensional sufficient structures in nonlinear sufficient dimension reduction (SDR) has long been a fundamental yet challenging problem. Most existing methods lack theoretical guarantees of exhaustiveness in identifying lower dimensional structures, either at the population level or at the sample level. We tackle this issue by proposing a new method, generative sufficient dimension reduction (GenSDR), which leverages modern generative models. We show that GenSDR is able to fully recover the information contained in the central $σ$-field at both the population and sample levels. In particular, at the sample level, we establish a consistency property for the GenSDR estimator from the perspective of conditional distributions, capitalizing on the distributional learning capabilities of deep generative models. Moreover, by incorporating an ensemble technique, we extend GenSDR to accommodate scenarios with non-Euclidean responses, thereby substantially broadening its applicability. Extensive numerical results demonstrate the outstanding empirical performance of GenSDR and highlight its strong potential for addressing a wide range of complex, real-world tasks.
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)
- Information Technology > Artificial Intelligence > Machine Learning > Learning in High Dimensional Spaces (0.91)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)
Consensus dimension reduction via multi-view learning
Dimension reduction methods are a fundamental class of techniques in data analysis, which aim to find a lower-dimensional representation of higher-dimensional data while preserving as much of the original information as possible. These methods are extensively used in practice, including in exploratory data analyses to visualize data--arguably, one of the first and most vital steps in any data analysis (Ray et al., 2021). Notably, in genomics, dimension reduction methods are ubiquitously applied to visualize high-dimensional single-cell RNA sequencing data in two dimensions (Becht et al., 2019). Beyond visualization, dimension reduction methods are also frequently employed to mitigate the curse of dimensionality (Bellman, 1957), engineer new features to improve downstream tasks like prediction (e.g., Massy, 1965), and enable scientific discovery in unsupervised learning settings (Chang et al., 2025). For example, many researchers have used dimension reduction in conjunction with clustering to discover new cell types and cell states (Wu et al., 2021), new cancer subtypes (Northcott et al., 2017), and other substantively-meaningful structure in a variety of domains (Bergen et al., 2019; Traven et al., 2017). Given the widespread use and need for dimension reduction methods, numerous dimension reduction techniques have been developed. Popular techniques include but are not limited to principal component analysis (PCA) (Pearson, 1901; Hotelling, 1933), multidimensional scaling (MDS) (Torgerson, 1952; Kruskal, 1964a), Isomap (Tenenbaum et al., 2000), locally linear embedding (LLE) (Roweis and Saul, 2000), t-distributed stochastic neighbor embedding (t-SNE) (van der 1
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning in High Dimensional Spaces (1.00)
Contrastive Dimension Reduction: A Systematic Review
Hawke, Sam, Zhang, Eric, Chen, Jiawen, Li, Didong
Contrastive dimension reduction (CDR) methods aim to extract signal unique to or enriched in a treatment (foreground) group relative to a control (background) group. This setting arises in many scientific domains, such as genomics, imaging, and time series analysis, where traditional dimension reduction techniques such as Principal Component Analysis (PCA) may fail to isolate the signal of interest. In this review, we provide a systematic overview of existing CDR methods. We propose a pipeline for analyzing case-control studies together with a taxonomy of CDR methods based on their assumptions, objectives, and mathematical formulations, unifying disparate approaches under a shared conceptual framework. We highlight key applications and challenges in existing CDR methods, and identify open questions and future directions. By providing a clear framework for CDR and its applications, we aim to facilitate broader adoption and motivate further developments in this emerging field.
- North America > United States > North Carolina (0.04)
- Africa > Senegal > Kolda Region > Kolda (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning in High Dimensional Spaces (0.91)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
- Europe > France > Occitanie > Haute-Garonne > Toulouse (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Health & Medicine > Health Care Technology (1.00)
- Health & Medicine > Therapeutic Area > Neurology (0.94)
- Health & Medicine > Diagnostic Medicine > Imaging (0.93)
- Asia > Middle East > Jordan (0.04)
- North America > United States > New Jersey > Hudson County > Hoboken (0.04)
- Asia > India > Karnataka > Bengaluru (0.04)
- Information Technology > Data Science (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.66)
- Information Technology > Artificial Intelligence > Machine Learning > Learning in High Dimensional Spaces (0.40)
456ac9b0d15a8b7f1e71073221059886-Reviews.html
"NIPS 2013 Neural Information Processing Systems December 5 - 10, Lake Tahoe, Nevada, USA",,, "Paper ID:","1051" "Title:","Minimax Theory for High-dimensional Gaussian Mixtures with Sparse Mean Separation" Reviews First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. The paper studies the problem of identifying Gaussians in a mixture in high dimensions when the separation between the Gaussians is small. The assumption is that the Gaussians are separated along few dimensions and hence by identifying these dimensions, that is, feature selection, the curse of dimensionality can be bitten and the Gaussians can be found. Clustering in high dimension is an open problem that well deserve a study. The theoretical approach taken by the authors is good step in the path towards better understanding the problem.
Interpretable dimension reduction for compositional data
Park, Junyoung, Park, Cheolwoo, Ahn, Jeongyoun
High-dimensional compositional data, such as those from human microbiome studies, pose unique statistical challenges due to the simplex constraint and excess zeros. While dimension reduction is indispensable for analyzing such data, conventional approaches often rely on log-ratio transformations that compromise interpretability and distort the data through ad hoc zero replacements. We introduce a novel framework for interpretable dimension reduction of compositional data that avoids extra transformations and zero imputations. Our approach generalizes the concept of amalgamation by softening its operation, mapping high-dimensional compositions directly to a lower-dimensional simplex, which can be visualized in ternary plots. The framework further provides joint visualization of the reduction matrix, enabling intuitive, at-a-glance interpretation. To achieve optimal reduction within our framework, we incorporate sufficient dimension reduction, which defines a new identifiable objective: the central compositional subspace. For estimation, we propose a compositional kernel dimension reduction (CKDR) method. The estimator is provably consistent, exhibits sparsity that reveals underlying amalgamation structures, and comes with an intrinsic predictive model for downstream analyses. Applications to real microbiome datasets demonstrate that our approach provides a powerful graphical exploration tool for uncovering meaningful biological patterns, opening a new pathway for analyzing high-dimensional compositional data.
- Asia > Middle East > Jordan (0.04)
- North America > United States > Michigan (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning in High Dimensional Spaces (1.00)
DIT: Dimension Reduction View on Optimal NFT Rarity Meters
Belousov, Dmitry, Yanovich, Yury
Non-fungible tokens (NFTs) have become a significant digital asset class, each uniquely representing virtual entities such as artworks. These tokens are stored in collections within smart contracts and are actively traded across platforms on Ethereum, Bitcoin, and Solana blockchains. The value of NFTs is closely tied to their distinctive characteristics that define rarity, leading to a growing interest in quantifying rarity within both industry and academia. While there are existing rarity meters for assessing NFT rarity, comparing them can be challenging without direct access to the underlying collection data. The Rating over all Rarities (ROAR) benchmark addresses this challenge by providing a standardized framework for evaluating NFT rarity. This paper explores a dimension reduction approach to rarity design, introducing new performance measures and meters, and evaluates them using the ROAR benchmark. Our contributions to the rarity meter design issue include developing an optimal rarity meter design using non-metric weighted multidimensional scaling, introducing Dissimilarity in Trades (DIT) as a performance measure inspired by dimension reduction techniques, and unveiling the non-interpretable rarity meter DIT, which demonstrates superior performance compared to existing methods.
- North America > United States > New York > New York County > New York City (0.14)
- Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
- Asia > Russia (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Banking & Finance > Trading (1.00)
- Information Technology > Services > e-Commerce Services (0.48)
- Information Technology > e-Commerce > Financial Technology (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Learning in High Dimensional Spaces (0.82)
- (2 more...)