AITopics | Learning in High Dimensional Spaces

Collaborating Authors

Learning in High Dimensional Spaces

High-dimensional spaces frequently occur in mathematics and the sciences. They may be parameter spaces or configuration spaces such as in Lagrangian or Hamiltonian mechanics; these are abstract spaces, independent of the physical space we live in. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Does the Barron space really defy the curse of dimensionality?

Schavemaker, Olov

arXiv.org Machine LearningAug-19-2025

The Barron space has become famous in the theory of (shallow) neural networks because it seemingly defies the curse of dimensionality. And while the Barron space (and generalizations) indeed defies (defy) the curse of dimensionality from the POV of classical smoothness, we herein provide some evidence in favor of the idea that the Barron space (and generalizations) does (do) not defy the curse of dimensionality with a nonclassical notion of smoothness which relates naturally to "infinitely wide" shallow neural networks. Like how the Bessel potential spaces are defined via the Fourier transform, we define so-called ADZ spaces via the Mellin transform; these ADZ spaces encapsulate the nonclassical smoothness we alluded to earlier. 38 pages, will appear in the dissertation of the author

artificial intelligence, machine learning, stirling number, (13 more...)

arXiv.org Machine Learning

2508.12273

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Indiana (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning in High Dimensional Spaces (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

DIT: Dimension Reduction View on Optimal NFT Rarity Meters

Belousov, Dmitry, Yanovich, Yury

arXiv.org Artificial IntelligenceAug-19-2025

Non-fungible tokens (NFTs) have become a significant digital asset class, each uniquely representing virtual entities such as artworks. These tokens are stored in collections within smart contracts and are actively traded across platforms on Ethereum, Bitcoin, and Solana blockchains. The value of NFTs is closely tied to their distinctive characteristics that define rarity, leading to a growing interest in quantifying rarity within both industry and academia. While there are existing rarity meters for assessing NFT rarity, comparing them can be challenging without direct access to the underlying collection data. The Rating over all Rarities (ROAR) benchmark addresses this challenge by providing a standardized framework for evaluating NFT rarity. This paper explores a dimension reduction approach to rarity design, introducing new performance measures and meters, and evaluates them using the ROAR benchmark. Our contributions to the rarity meter design issue include developing an optimal rarity meter design using non-metric weighted multidimensional scaling, introducing Dissimilarity in Trades (DIT) as a performance measure inspired by dimension reduction techniques, and unveiling the non-interpretable rarity meter DIT, which demonstrates superior performance compared to existing methods.

artificial intelligence, machine learning, rarity meter, (18 more...)

arXiv.org Artificial Intelligence

2508.12671

Country: North America > United States > New York > New York County > New York City (0.14)

Genre: Research Report > New Finding (0.93)

Industry:

Banking & Finance > Trading (1.00)
Information Technology > Services > e-Commerce Services (0.48)

Technology:

Information Technology > e-Commerce > Financial Technology (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning in High Dimensional Spaces (0.82)
(2 more...)

Add feedback

A Probabilistic Graph Coupling View of Dimension Reduction

Neural Information Processing SystemsAug-14-2025, 13:02:07 GMT

They are well understood and unified in the kernel PCA framework [16].

graph, neighbor, objective, (15 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > Canada > Ontario > Toronto (0.04)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Learning in High Dimensional Spaces (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Add feedback

Fundamental Limits of Learning High-dimensional Simplices in Noisy Regimes

Saberi, Seyed Amir Hossein, Najafi, Amir, Motahari, Abolfazl, khalaj, Babak H.

arXiv.org Machine LearningJun-13-2025

In this paper, we establish sample complexity bounds for learning high-dimensional simplices in $\mathbb{R}^K$ from noisy data. Specifically, we consider $n$ i.i.d. samples uniformly drawn from an unknown simplex in $\mathbb{R}^K$, each corrupted by additive Gaussian noise of unknown variance. We prove an algorithm exists that, with high probability, outputs a simplex within $\ell_2$ or total variation (TV) distance at most $\varepsilon$ from the true simplex, provided $n \ge (K^2/\varepsilon^2) e^{\mathcal{O}(K/\mathrm{SNR}^2)}$, where $\mathrm{SNR}$ is the signal-to-noise ratio. Extending our prior work~\citep{saberi2023sample}, we derive new information-theoretic lower bounds, showing that simplex estimation within TV distance $\varepsilon$ requires at least $n \ge Ω(K^3 σ^2/\varepsilon^2 + K/\varepsilon)$ samples, where $σ^2$ denotes the noise variance. In the noiseless scenario, our lower bound $n \ge Ω(K/\varepsilon)$ matches known upper bounds up to constant factors. We resolve an open question by demonstrating that when $\mathrm{SNR} \ge Ω(K^{1/2})$, noisy-case complexity aligns with the noiseless case. Our analysis leverages sample compression techniques (Ashtiani et al., 2018) and introduces a novel Fourier-based method for recovering distributions from noisy observations, potentially applicable beyond simplex learning.

artificial intelligence, machine learning, simplex, (14 more...)

arXiv.org Machine Learning

2506.10101

Country:

Asia > Middle East > Iran > Tehran Province > Tehran (0.04)
North America > United States > New York (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.45)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning in High Dimensional Spaces (0.62)

Add feedback

Breaking the curse of dimensionality in structured density estimation

Neural Information Processing SystemsMay-30-2025, 05:58:13 GMT

We consider the problem of estimating a structured multivariate density, subject to Markov conditions implied by an undirected graph. In the worst case, without Markovian assumptions, this problem suffers from the curse of dimensionality. Our main result shows how the curse of dimensionality can be avoided or greatly alleviated under the Markov property, and applies to arbitrary graphs. While existing results along these lines focus on sparsity or manifold assumptions, we introduce a new graphical quantity called graph resilience'' and show that it dictates the optimal sample complexity. Surprisingly, although one might expect the sample complexity of this problem to scale with local graph parameters such as the degree, this turns out not to be the case.

density estimation, dimensionality, sample complexity, (2 more...)

Neural Information Processing Systems

Technology:

Information Technology > Data Science > Data Mining (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Learning in High Dimensional Spaces (0.97)

Add feedback

Contrastive dimension reduction: when and how?

Neural Information Processing SystemsMay-27-2025, 07:38:09 GMT

Dimension reduction (DR) is an important and widely studied technique in exploratory data analysis. However, traditional DR methods are not applicable to datasets with with a contrastive structure, where data are split into a foreground group of interest (case or treatment group), and a background group (control group). This type of data, common in biomedical studies, necessitates contrastive dimension reduction (CDR) methods to effectively capture information unique to or enriched in the foreground group relative to the background group. Despite the development of various CDR methods, two critical questions remain underexplored: when should these methods be applied, and how can the information unique to the foreground group be quantified? In this work, we address these gaps by proposing a hypothesis test to determine the existence of contrastive information, and introducing a contrastive dimension estimator (CDE) to quantify the unique components in the foreground group. We provide theoretical support for our methods and validate their effectiveness through extensive simulated, semi-simulated, and real experiments involving images, gene expressions, protein expressions, and medical sensors, demonstrating their ability to identify the unique information in the foreground group.

background group, contrastive dimension reduction, foreground group, (2 more...)

Neural Information Processing Systems

Genre: Research Report > Experimental Study (1.00)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning in High Dimensional Spaces (0.90)
Information Technology > Data Science (0.63)

Add feedback

Overcoming the curse of dimensionality with Laplacian regularization in semi-supervised learning

Neural Information Processing SystemsMay-27-2025, 07:28:17 GMT

As annotations of data can be scarce in large-scale practical problems, leveraging unlabelled examples is one of the most important aspects of machine learning. This is the aim of semi-supervised learning. To benefit from the access to unlabelled data, it is natural to diffuse smoothly knowledge of labelled data to unlabelled one. This induces to the use of Laplacian regularization. Yet, current implementations of Laplacian regularization suffer from several drawbacks, notably the well-known curse of dimensionality.

dimensionality, laplacian regularization, semi-supervised learning, (1 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Learning in High Dimensional Spaces (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.66)

Add feedback

Few-Shot Diffusion Models Escape the Curse of Dimensionality

Neural Information Processing SystemsMay-27-2025, 06:32:48 GMT

While diffusion models have demonstrated impressive performance, there is a growing need for generating samples tailored to specific user-defined concepts. The customized requirements promote the development of few-shot diffusion models, which use limited n_{ta} target samples to fine-tune a pre-trained diffusion model trained on n_s source samples. Moreover, the existing results for diffusion models without a fine-tuning phase can not explain why few-shot models generate great samples due to the curse of dimensionality. In this work, we analyze few-shot diffusion models under a linear structure distribution with a latent dimension d . From the optimization perspective, we consider a latent Gaussian special case and prove that the optimization problem has a closed-form minimizer.

artificial intelligence, diffusion model, machine learning, (6 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning in High Dimensional Spaces (0.63)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.60)

Add feedback

Over-parameterized Adversarial Training: An Analysis Overcoming the Curse of Dimensionality

Neural Information Processing SystemsMay-26-2025, 15:22:55 GMT

Adversarial training is a popular method to give neural nets robustness against adversarial perturbations. In practice adversarial training leads to low robust training loss. However, a rigorous explanation for why this happens under natural conditions is still missing. Recently a convergence theory of standard (non-adversarial) supervised training was developed by various groups for {\em very overparametrized} nets. It is unclear how to extend these results to adversarial training because of the min-max objective.

artificial intelligence, machine learning, over-parameterized adversarial training, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.44)
Information Technology > Artificial Intelligence > Machine Learning > Learning in High Dimensional Spaces (0.40)

Add feedback

An Incremental Non-Linear Manifold Approximation Method

Hettige, Praveen T. W., Ong, Benjamin W.

arXiv.org Machine LearningApr-11-2025

Analyzing high-dimensional data presents challenges due to the "curse of dimensionality'', making computations intensive. Dimension reduction techniques, categorized as linear or non-linear, simplify such data. Non-linear methods are particularly essential for efficiently visualizing and processing complex data structures in interactive and graphical applications. This research develops an incremental non-linear dimension reduction method using the Geometric Multi-Resolution Analysis (GMRA) framework for streaming data. The proposed method enables real-time data analysis and visualization by incrementally updating the cluster map, PCA basis vectors, and wavelet coefficients. Numerical experiments show that the incremental GMRA accurately represents non-linear manifolds even with small initial samples and aligns closely with batch GMRA, demonstrating efficient updates and maintaining the multiscale structure. The findings highlight the potential of Incremental GMRA for real-time visualization and interactive graphics applications that require adaptive high-dimensional data representations.

approximation, data mining, machine learning, (12 more...)

arXiv.org Machine Learning

2504.09068

Country:

North America > United States > Michigan (0.04)
North America > United States > Maryland > Baltimore (0.04)
Asia > Middle East > Israel (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning in High Dimensional Spaces (0.75)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.52)

Add feedback