AITopics | effective dimension

Collaborating Authors

effective dimension

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

On Local Population-Risk Certificates

Song, Mingzhi

arXiv.org Machine LearningJun-30-2026

We develop finite-sample certificates for local population-risk increments $Pδ_v=R(θ_0+v)-R(θ_0)$, $v\in\mathcal D$. The primitive object is an expected-valid upper endpoint $\widehat{\mathsf U}_{\mathcal D}$ satisfying $\mathbb E\sup_{v\in\mathcal D} \{Pδ_v-\widehat{\mathsf U}_{\mathcal D}(v)\}\le0$. This uniform criterion certifies any measurable update selected from the same sample and allows penalties to depend on empirical geometry. The main construction is a cross-fitted ridge calibration for linear feature classes. A pilot fold learns the ridge metric, the complementary fold calibrates the squared mean error in that metric, and complete split averaging recovers the full empirical covariance in the directional quadratic form $\widehat q_{X,λ}$. The optimized diagnostic scale is $\{\widehat q_{X,λ}(h) \widehat r_{X,n_{\rm p},λ}^{\rm cf}/n\}^{1/2}$, and the calibrated trace factor $\widehat r_{X,n_{\rm p},λ}^{\rm cf}$ is compared with the ordinary ridge effective dimension $\widehat r_{X,λ}$. For nonsmooth losses, an exact fixed-mask decomposition $δ_v=J_v^0+R_v^\circ+C_v$ separates frozen Taylor fluctuations, good-path remainders, and interface crossings. Applying the linear and composite certificates componentwise yields endpoints for same-sample expected local search and concentrated release rules.

artificial intelligence, certificate, machine learning, (18 more...)

arXiv.org Machine Learning

2606.19147

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.66)

Add feedback

Thumb on the Scale: Optimal Loss Weighting in Last Layer Retraining

Neural Information Processing SystemsJun-17-2026, 02:36:50 GMT

While machine learning models become more capable in discriminative tasks at scale, their ability to overcome biases introduced by training data has come under increasing scrutiny. Previous results suggest that there are two extremes of parameterization with very different behaviors: the population (underparameterized) setting where loss weighting is optimal and the separable overparameterized setting where loss weighting is ineffective at ensuring equal performance across classes. This work explores the regime of last layer retraining (LLR) in which the unseen limited (retraining) data is frequently inseparable and the model proportionately sized, falling between the two aforementioned extremes. We show, in theory and practice, that loss weighting is still effective in this regime, but that these weights must take into account the relative overparameterization of the model.

artificial intelligence, machine learning, weighting, (18 more...)

Neural Information Processing Systems

Country: North America > Canada > Ontario (0.28)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.88)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Generalization in Nonlinear Least Squares via Learned Feature Geometry

Kharel, Ayub, Kuzborskij, Ilja, Rebeschini, Patrick, Abbasi-Yadkori, Yasin

arXiv.org Machine LearningJun-10-2026

We study the generalization of ridge-regularized nonlinear least-squares models via on-average algorithmic stability, deriving error bounds for local minimizers in terms of a data-dependent effective dimension that reflects the geometry of the gradient model at the trained parameters, through the empirical Jacobian Gram matrix and a residual-curvature term. In the linear case, where the curvature term vanishes, this recovers the classical effective dimension of the Jacobian kernel covariance, but evaluated at the trained model rather than at initialization as is typical in neural tangent kernel analyses. We further bound this effective dimension via covering complexity of the gradient features, leading to guarantees that depend on learned geometry rather than parameter count. In particular, for manifold-supported data and piecewise Lipschitz Jacobians, the bounds scale with intrinsic dimension, while for one-hidden-layer ReLU networks, the mechanism can be made explicit through counts of activation-stable regions. Experiments on synthetic manifolds, clustered distributions, and benchmark datasets illustrate trained-Jacobian compression, the tightness of the residual-curvature linearization, and agreement between the stability bound and observed generalization gaps. A key feature of our bounds is the simplicity of their derivation, which follows from first principles using the Brascamp-Lieb inequality under strongly log-concave noise.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Machine Learning

2606.08799

Country: North America > United States (0.93)

Genre: Research Report > Experimental Study (1.00)

Industry:

Information Technology (0.45)
Education (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Sharper Guarantees for Misspecified Kernelized Bandit Optimization

Maran, Davide, Szepesvári, Csaba

arXiv.org Machine LearningMay-8-2026

Existing guarantees for misspecified kernelized bandit optimization pay for misspecification through kernel complexity: in generic offline bounds, the misspecification level $\varepsilon$ is multiplied by $\sqrt{d_\mathrm{eff}}$, where $d_\mathrm{eff}$ is the kernel effective dimension, while in online regret bounds, the corresponding penalty is $\sqrt{γ_n}\,n\varepsilon$, where $γ_n$ is the maximum information gain after $n$ rounds of interaction. In this work, we show that, for a large class of kernels, the misspecification amplification can be reduced to logarithmic or polylogarithmic growth. In the offline setting, we first prove high-probability simple-regret bounds whose misspecification term is governed by a spectral Lebesgue constant. This yields logarithmic amplification for one-dimensional monotone spectra and polylogarithmic amplification for multivariate Fourier-diagonal product kernels. In the online setting, we modify a domain-splitting algorithm and prove a cumulative regret bound of $\widetilde{\mathcal O}(\sqrt{γ_n n}+n\varepsilon)$ under mild localized eigendecay assumptions, removing the extra $\sqrt{γ_n}$ factor from the misspecification term. The common principle is localization: spectral localization controls the Lebesgue constant of the offline approximation operator, while domain splitting implements the spatial analogue of this mechanism in the online setting, preventing local misspecification errors from being amplified globally.

artificial intelligence, kernel, machine learning, (19 more...)

arXiv.org Machine Learning

2605.05967

Country: North America (0.28)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.45)

Add feedback

Spectral bandits

Kocák, Tomáš, Munos, Rémi, Kveton, Branislav, Agrawal, Shipra, Valko, Michal

arXiv.org Machine LearningApr-29-2026

Smooth functions on graphs have wide applications in manifold and semi-supervised learning. In this work, we study a bandit problem where the payoffs of arms are smooth on a graph. This framework is suitable for solving online learning problems that involve graphs, such as content-based recommendation. In this problem, each item we can recommend is a node of an undirected graph and its expected rating is similar to the one of its neighbors. The goal is to recommend items that have high expected ratings. We aim for the algorithms where the cumulative regret with respect to the optimal policy would not scale poorly with the number of nodes. In particular, we introduce the notion of an effective dimension, which is small in real-world graphs, and propose three algorithms for solving our problem that scale linearly and sublinearly in this dimension. Our experiments on content recommendation problem show that a good estimator of user preferences for thousands of items can be learned from just tens of node evaluations.

artificial intelligence, data mining, machine learning, (20 more...)

arXiv.org Machine Learning

2604.25272

Country:

North America > United States (1.00)
Europe (1.00)

Genre: Research Report (0.82)

Industry: Education > Educational Setting > Online (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.88)
Information Technology > Data Science > Data Mining > Big Data (0.67)

Add feedback

Provably Strict Generalisation Benefit for Invariance in Kernel Methods

Neural Information Processing SystemsApr-26-2026, 16:25:44 GMT

It is a commonly held belief that enforcing invariance improves generalisation. Although this approach enjoys widespread popularity, it is only very recently that a rigorous theoretical demonstration of this benefit has been established. In this work we build on the function space perspective of Elesedy and Zaidi [8] to derive a strictly non-zero generalisation benefit of incorporating invariance in kernel ridge regression when the target is invariant to the action of a compact group. We study invariance enforced by feature averaging and find that generalisation is governed by a notion of effective dimension that arises from the interplay between the kernel and the group. In building towards this result, we find that the action of the group induces an orthogonal decomposition of both the reproducing kernel Hilbert space and its kernel, which may be of interest in its own right.

artificial intelligence, invariance, machine learning, (11 more...)

Neural Information Processing Systems

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Kernel Methods (0.41)

Add feedback

ParK: Sound and Efficient Kernel Ridge Regression by Feature Space Partitions

Neural Information Processing SystemsApr-25-2026, 09:56:48 GMT

We introduce ParK, a new large-scale solver for kernel ridge regression. Our approach combines partitioning with random projections and iterative optimization to reduce space and time complexity while provably maintaining the same statistical accuracy. In particular, constructing suitable partitions directly in the feature space rather than in the input space, we promote orthogonality between the local estimators, thus ensuring that key quantities such as local effective dimension and bias remain under control. We characterize the statistical-computational tradeoff of our model, and demonstrate the effectiveness of our method by numerical experiments on large-scale datasets.

artificial intelligence, estimator, machine learning, (15 more...)

Neural Information Processing Systems

Technology: