AITopics | suffice

A central goal in nonparametric statistics is adaptation: the ability of an estimator to perform simultaneously and optimally across a wide variety of settings with little to no tuning. When inference is carried out over a class of functional spaces, it is desirable that the estimator automatically adapts to unknown features of these spaces, such as smoothness, geometry, sparsity or other finer structural properties. A large body of literature has focused on adaptation: Lepski's method Lepski ı [1990, 1991], thresholding Donoho et al. [1995] and model selection Barron et al. [1999] are amongst the most well-known nonBayesian approaches. Bayesian methods, on the other hand, have a natural ability to achieve adaptation, as we discuss in more detail below, by choosing prior distributions that are flexible enough to achieve this task (one possibility is for instance to draw certain prior parameters at random in a hierarchical Bayes fashion). Recently, motivated by the remarkable empirical success of deep learning methods, there has been a growing interest in understanding how neural networks can automatically learn structural parameters, such as smoothness of functions or'effective' dimensions, for instance in regression settings exhibiting a compositional structure as in Schmidt-Hieber [2020], Kohler and Langer [2021] or for data lying on geometric structures (e.g.

artificial intelligence, bayesian inference, machine learning, (19 more...)

arXiv.org Machine Learning

2606.2048

Country: Europe (0.67)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.65)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)

Add feedback

Follow-the-Perturbed-Leader Nearly Achieves Best-of-Both-Worlds for the m-Set Semi-Bandit Problems

Neural Information Processing SystemsJun-20-2026, 21:50:48 GMT

We consider a common case of the combinatorial semi-bandit problem, the m-set semi-bandit, where the learner exactly selects m arms from the total d arms. In the adversarial setting, the best regret bound, known to be O( nmd) for time horizon n, is achieved by the well-known Follow-the-Regularized-Leader (FTRL) policy. However, this requires to explicitly compute the arm-selection probabilities via optimizing problems at each time step and sample according to them. This problem can be avoided by the Follow-the-Perturbed-Leader (FTPL) policy, which simply pulls the m arms that rank among the m smallest (estimated) loss with random perturbation. In this paper, we show that FTPL with a Fréchet perturbation also enjoys the near optimal regret bound O( nm( p dlog(d) + m5/6)) in the adversarial setting and approaches best-of-both-world regret bounds, i.e., achieves a logarithmic regret for the stochastic setting. Moreover, our lower bounds show that the extra factors are unavoidable with our approach; any improvement would require a fundamentally different and more challenging method.

artificial intelligence, data mining, machine learning, (20 more...)

Neural Information Processing Systems

Country:

Europe > France (0.28)
North America > United States (0.28)
Europe > United Kingdom (0.28)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Information-Computation Tradeoffs for Noiseless Linear Regression with Oblivious Contamination

Neural Information Processing SystemsJun-20-2026, 12:41:13 GMT

We study the task of noiseless linear regression under Gaussian covariates in the presence of additive oblivious contamination. Specifically, we are given i.i.d.

algorithm, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Country: North America > United States (1.00)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.70)

Add feedback

Emergence and scaling laws in SGDlearning of shallow neural networks

Neural Information Processing SystemsJun-16-2026, 06:41:17 GMT

artificial intelligence, arxiv preprint arxiv, machine learning, (14 more...)

Neural Information Processing Systems

Country: North America > United States (0.27)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Structure-Aware Spectral Sparsification via Uniform Edge Sampling

Neural Information Processing SystemsJun-12-2026, 00:15:16 GMT

Spectral clustering is a fundamental method for graph partitioning, but its reliance on eigenvector computation limits scalability to massive graphs. Classical sparsification methods preserve spectral properties by sampling edges proportionally to their effective resistances, but require expensive preprocessing to estimate these resistances. We study whether uniform edge sampling--a simple, structure-agnostic strategy--can suffice for spectral clustering. Our main result shows that for graphs admitting a well-separated $k$-clustering, characterized by a large structure ratio $\Upsilon(k) = \lambda_{k+1} / \rho_G(k)$, uniform sampling preserves the spectral subspace used for clustering. Specifically, we prove that uniformly sampling $O(\gamma^2 n \log n / \varepsilon^2)$ edges, where $\gamma$ is the Laplacian condition number, yields a sparsifier whose top $(n-k)$-dimensional eigenspace is approximately orthogonal to the cluster indicators.

artificial intelligence, name change, proceedings, (7 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.39)

Technology: Information Technology > Artificial Intelligence (0.39)

Add feedback

Limitations of Learning Tanh Neural Networks with Finite Precision

Grohs, Philipp, Trödler, Matěj

arXiv.org Machine LearningJun-12-2026

We investigate limitations of learning $\tanh$ neural networks from point evaluations under finite-precision computations and $L^p$ accuracy guarantees, building on Berner, Grohs, and Voigtländer (2023). Our approach is based on a novel construction of sharply localized bump functions via iterated $\tanh$ activations. Using this mechanism, we show that, in a finite-precision setting, no adaptive randomized algorithm based on $m$ samples can achieve a convergence rate higher than the Monte Carlo rate $O(m^{-1/p})$ in the $L^p$ norm, unless the sampling budget grows exponentially with the size of the network parameters and architecture. The results reveal fundamental limitations imposed by finite precision on the learnability of classes containing localized bump functions, extending previous results for ReLU networks to the $\tanh$ setting.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Machine Learning

2606.11104

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

AModeling (1)

Neural Information Processing SystemsApr-30-2026, 10:09:10 GMT

These appendices contain demonstrations of the results in the main text as well as additional technical notes.

artificial intelligence, construction, val, (17 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)

Add feedback

Appendix

Neural Information Processing SystemsApr-27-2026, 21:48:07 GMT

Our results heavily rely on the specific nature of the periodic activation function, so a natural question is to which extent our results can be extended beyond the single periodic neuron class. For lower bounds, a challenging but very interesting generalization would be to establish the cryptographic-hardness of learning certain family of GLMs whose activation function does not need to be periodic. A potentially easier route forward on this direction, would be to consider the Hermite decomposition of the activation function, similar to [A3], and establish lower bounds on the performance of low-degree methods [A23], of SGD [A3], or of local search methods methods [A15], for activation functions whose low-degree Hermite coefficients are exponentially small. For upper bounds, we believe that our proposed LLL-based algorithm may be extended beyond learning even periodic activation functions, such as the cosine activation, by appropriately post-processing the measurements, but leave this for future work. Furthermore, it would be interesting to better understand (empirically or analytically) the noise tolerance of our LLL-based algorithm for "low-frequency" activation functions, such as the absolute value underlying the phase retrieval problem which has "zero" frequency.

artificial intelligence, exp, machine learning, (16 more...)

Neural Information Processing Systems

Country: North America > United States (0.46)

Genre: Research Report (0.54)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback