AITopics | theannalsofstatistic

Deep learning is adaptive to intrinsic dimensionality of model smoothness in anisotropic Besov space

Neural Information Processing SystemsApr-25-2026, 00:11:52 GMT

Deep learning has exhibited superior performance for various tasks, especially for high-dimensional datasets, such as images. To understand this property, we investigate the approximation and estimation ability of deep learning on anisotropic Besov spaces. The anisotropic Besov space is characterized by direction-dependent smoothness and includes several function classes that have been investigated thus far. We demonstrate that the approximation error and estimation error of deep learning only depend on the average value of the smoothness parameters in all directions. Consequently, the curse of dimensionality can be avoided if the smoothness of the target function is highly anisotropic. Unlike existing studies, our analysis does not require a low-dimensional structure of the input data. We also investigate the minimax optimality of deep learning and compare its performance with that of the kernel method (more generally, linear estimators). The results show that deep learning has better dependence on the input dimensionality if the target function possesses anisotropic smoothness, and it achieves an adaptive rate for functions with spatially inhomogeneous smoothness.

artificial intelligence, dimensionality, machine learning, (16 more...)

Neural Information Processing Systems

Country: Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)

Genre: Research Report (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Deep learning is adaptive to intrinsic dimensionality of model smoothness in anisotropic Besov space

Neural Information Processing SystemsApr-25-2026, 00:11:48 GMT

Deep learning has exhibited superior performance for various tasks, especially for high-dimensional datasets, such as images. To understand this property, we investigate the approximation and estimation ability of deep learning on anisotropic Besov spaces. The anisotropic Besov space is characterized by direction-dependent smoothness and includes several function classes that have been investigated thus far. We demonstrate that the approximation error and estimation error of deep learning only depend on the average value of the smoothness parameters in all directions. Consequently, the curse of dimensionality can be avoided if the smoothness of the target function is highly anisotropic. Unlike existing studies, our analysis does not require a low-dimensional structure of the input data. We also investigate the minimax optimality of deep learning and compare its performance with that of the kernel method (more generally, linear estimators). The results show that deep learning has better dependence on the input dimensionality if the target function possesses anisotropic smoothness, and it achieves an adaptive rate for functions with spatially inhomogeneous smoothness.

artificial intelligence, dimensionality, machine learning, (15 more...)

Neural Information Processing Systems

Country: Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.15)

Genre: Research Report (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Symmetry Guarantees Statistic Recovery in Variational Inference

Marks, Daniel, Paccagnan, Dario, van der Wilk, Mark

arXiv.org Machine LearningApr-21-2026

Variational inference (VI) is a central tool in modern machine learning, used to approximate an intractable target density by optimising over a tractable family of distributions. As the variational family cannot typically represent the target exactly, guarantees on the quality of the resulting approximation are crucial for understanding which of its properties VI can faithfully capture. Recent work has identified instances in which symmetries of the target and the variational family enable the recovery of certain statistics, even under model misspecification. However, these guarantees are inherently problem-specific and offer little insight into the fundamental mechanism by which symmetry forces statistic recovery. In this paper, we overcome this limitation by developing a general theory of symmetry-induced statistic recovery in variational inference. First, we characterise when variational minimisers inherit the symmetries of the target and establish conditions under which these pin down identifiable statistics. Second, we unify existing results by showing that previously known statistic recovery guarantees in location-scale families arise as special cases of our theory. Third, we apply our framework to distributions on the sphere to obtain novel guarantees for directional statistics in von Mises-Fisher families. Together, these results provide a modular blueprint for deriving new recovery guarantees for VI in a broad range of symmetry settings.

artificial intelligence, machine learning, sd 1, (18 more...)

arXiv.org Machine Learning

2604.1831

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Rhode Island > Providence County > Providence (0.04)
North America > United States > California (0.04)
(2 more...)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.34)

Add feedback

Early-stopped aggregation: Adaptive inference with computational efficiency

Ohn, Ilsang, Fan, Shitao, Jun, Jungbin, Lin, Lizhen

arXiv.org Machine LearningApr-17-2026

When considering a model selection or, more generally, an aggregation approach for adaptive statistical inference, it is often necessary to compute estimators over a wide range of model complexities including unnecessarily large models even when the true data-generating process is relatively simple, due to the lack of prior knowledge. This requirement can lead to substantial computational inefficiency. In this work, we propose a novel framework for efficient model aggregation called the early-stopped aggregation (ESA): instead of computing and aggregating estimators for all candidate models, we compute only a small number of simpler ones using an early-stopping criterion and aggregate only these for final inference. Our framework is versatile and applies to both Bayesian model selection, in particular, within the variational Bayes framework, and frequentist estimation, including a general penalized estimation setting. We investigate adaptive optimal property of the ESA approach across three learning paradigms. We first show that ESA achieves optimal adaptive contraction rates in the variational Bayes setting under mild conditions. We extend this result to variational empirical Bayes, where prior hyperparameters are chosen in a data-dependent manner. In addition, we apply the ESA approach to frequentist aggregation including both penalization-based and sample-splitting implementations, and establish corresponding theory. As we demonstrate, there is a clear unification between early-stopped Bayes and frequentist penalized aggregation, with a common "energy" functional comprising a data-fitting term and a complexity-control term that drives both procedures. We further present several applications and numerical studies that highlight the efficiency and strong performance of the proposed approach.

artificial intelligence, bayesian inference, machine learning, (18 more...)

arXiv.org Machine Learning

2604.14404

Country:

Europe > United Kingdom (0.04)
Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)

Genre: Research Report (0.50)

Add feedback

LearningGaussianMixtureswithGeneralisedLinear Models: PreciseAsymptoticsinHigh-dimensions

Neural Information Processing SystemsFeb-19-2026, 02:42:34 GMT

We exemplify our result in two tasks of interest in statistical learning: a) classification for a mixture with sparse means, wherewestudytheefficiencyof `1penaltywithrespectto `2;b)max-marginmulticlass classification, where we characterise the phase transition on the existence ofthemulti-class logistic maximum likelihood estimator forK >2.

artificial intelligence, machine learning, preprintarxiv, (16 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
Europe > Switzerland > Vaud > Lausanne (0.05)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

f7ae58c7f1a1cc4abe9273a0f971ba2a-Paper.pdf

Neural Information Processing SystemsFeb-15-2026, 04:33:52 GMT

Variational Bayes (VB) is a scalable alternative to Markov chain Monte Carlo (MCMC)forBayesian posterior inference.

artificial intelligence, machine learning, posterior, (18 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.05)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)

Add feedback

Rates of Convergence for Large-scale Nearest Neighbor Classification

Xingye Qiao, Jiexin Duan, Guang Cheng

Neural Information Processing SystemsFeb-14-2026, 18:11:24 GMT

Inadditiontothememory limitation, there are other important concerns.

artificial intelligence, classifier, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Indiana (0.04)
North America > United States > New York > Broome County > Binghamton (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.71)

Add feedback

Algorithmic Analysis and Statistical Estimation of SLOPE via Approximate Message Passing

Zhiqi Bu, Jason Klusowski, Cynthia Rush, Weijie Su

Neural Information Processing SystemsFeb-13-2026, 08:40:17 GMT

The regularizer P λi|b|(i) is a sorted `1-norm (denoted asJλ(b) henceforth), which isnon-separabledue to the sorting operation involved in its calculation.

artificial intelligence, arxivpreprintarxiv, montanari, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > New Jersey > Middlesex County > New Brunswick (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology: Information Technology > Artificial Intelligence (0.46)

Add feedback

f6ccfa588d2a95bef5a3b101c02524c9-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-12-2026, 21:56:06 GMT

It is known that Binary Segmentation is consistent but not optimal (Venkatraman (1992)). As an improvement, Fryzlewicz (2014) propose WildBinary Segmentation andshowthatithasabetter localization rate.

artificial intelligence, bappendix, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America > United States > New Jersey > Hudson County > Hoboken (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

DetectingAbruptChangesinSequentialPairwise ComparisonData

Neural Information Processing SystemsFeb-12-2026, 21:56:03 GMT

In this paper we are concerned with localizing the change points in ahigh-dimensional BTL model with piecewiseconstantparameters.

artificial intelligence, change point, machine learning, (19 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > New Jersey > Hudson County > Hoboken (0.04)

Genre: Research Report > New Finding (0.68)

Industry: Leisure & Entertainment > Sports (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Filters

Collaborating Authors

theannalsofstatistic

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Deep learning is adaptive to intrinsic dimensionality of model smoothness in anisotropic Besov space

Deep learning is adaptive to intrinsic dimensionality of model smoothness in anisotropic Besov space

Symmetry Guarantees Statistic Recovery in Variational Inference

Early-stopped aggregation: Adaptive inference with computational efficiency

LearningGaussianMixtureswithGeneralisedLinear Models: PreciseAsymptoticsinHigh-dimensions

f7ae58c7f1a1cc4abe9273a0f971ba2a-Paper.pdf

Rates of Convergence for Large-scale Nearest Neighbor Classification

Algorithmic Analysis and Statistical Estimation of SLOPE via Approximate Message Passing

f6ccfa588d2a95bef5a3b101c02524c9-Supplemental-Conference.pdf

DetectingAbruptChangesinSequentialPairwise ComparisonData