AITopics | lemma 9

Collaborating Authors

lemma 9

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Smoothed Score Queries and the Complexity of Sampling

Liu, Jingbo

arXiv.org Machine LearningMay-28-2026

We study the query complexity of sampling from high-dimensional Gaussian distributions using gradient information. In the standard oracle model, exact gradients expose only matrix-vector products with the precision matrix, leading to polynomial approximation barriers and a characteristic $\sqrtκ$ dependence on the condition number. We show that this barrier disappears when the sampler is allowed to query \emph{smoothed scores}, namely gradients of the logarithms of the Gaussian-convolved densities. For a Gaussian target with precision matrix $Λ$, a smoothed-score query at noise level $τ$ gives access to the resolvent $(Λ+τ^{-1}I)^{-1}$. Combining geometrically spaced noise levels with sinc-quadrature rational approximation, we obtain a sampler with $q=O\!\left(\bigl(\logκ+\log(e\sqrt d/δ_{\rm TV})\bigr)\log(e\sqrt d/δ_{\rm TV})\right)$ smoothed-score queries for total variation error $δ_{\rm TV}$, improving the condition-number dependence from $\sqrtκ$ to logarithmic. We also study finite-bit gradient oracles. Using coordinatewise quantization of the transformed smoothed-score answers and a final dithering step, we obtain a sampling scheme whose total communicated gradient information is polylogarithmic in $κ$; in particular, for fixed dimension and accuracy, the bit complexity is $O(\log^2κ)$. To complement these upper bounds, we introduce a channel-synthesis, or reverse-Shannon, converse technique for sampling lower bounds. This converts total-variation simulation guarantees into communication requirements and yields an $Ω(\logκ)$ lower bound on the required gradient information. Together, these results identify smoothed scores as a provably more informative oracle for sampling and give nearly matching upper and lower bounds for its finite-bit complexity.

approximation, artificial intelligence, machine learning, (19 more...)

arXiv.org Machine Learning

2605.27769

Country: North America > United States (0.46)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Convergence Analysis of Newton's Method for Neural Networks in the Overparameterized Limit

Riedl, Konstantin, Spiliopoulos, Konstantinos, Sirignano, Justin

arXiv.org Machine LearningMay-21-2026

A convergence analysis is developed for the regularized Newton method for training neural networks (NNs) in the overparameterized limit. As the number of hidden units tends to infinity, the NN training dynamics converge in probability to the solution of a deterministic limit equation involving a ``Newton neural tangent kernel'' (NNTK). Explicit rates characterizing this convergence are provided and, in the infinite-width limit, we prove that the NN converges exponentially fast to the target data (i.e., a global minimizer with zero loss). We show that this convergence is uniform across the frequency spectrum, addressing the spectral bias inherent in gradient descent. The eigenvalues of the NTK for gradient descent accumulate at zero, leading to slow convergence for target data with high-frequency components. In contrast, the NNTK has uniformly lower bounded eigenvalues if the regularization parameter is selected appropriately, allowing Newton's method to converge more quickly for data with high-frequency components. Mathematical challenges that need to be addressed in our analysis include the implicit parameter update of the Newton method with a potentially indefinite Hessian matrix and the fact that the dimension of this linear system of equations tends to infinity as the NN width grows. This complicates deriving the training dynamics in the overparameterized limit as well as proving the convergence of the finite-width dynamics thereto. The analysis identifies a scaling formula for selecting the regularization parameter, which we show can vanish at a suitable rate as the number of hidden units becomes larger. We prove that, for sufficiently large numbers of hidden units, the regularized Hessian remains positive definite during training and the Newton updates for individual NN parameters converge to zero, showing that the model behaves as a linearization around the initialization.

artificial intelligence, convergence, machine learning, (18 more...)

arXiv.org Machine Learning

2605.08352

Country:

North America > United States (1.00)
Europe (1.00)
North America > Canada (0.68)

Genre: Workflow (0.94)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Well-Conditioned Oblivious Perturbations in Linear Space

Chenakkod, Shabarish, Dereziński, Michał, Dong, Xiaoyu, Rudelson, Mark

arXiv.org Machine LearningApr-28-2026

Perturbing a deterministic $n$-dimensional matrix with small Gaussian noise is a cornerstone of smoothed analysis of algorithms [Spielman and Teng, JACM 2004], as it reduces the condition number of the input to $O(n)$, and with it the complexity of many matrix algorithms. However, when deployed algorithmically, these perturbations are expensive due to the cost of generating and storing $n^2$ Gaussian random variables. We propose a perturbation that requires generating and storing $O(n)$ random numbers in $O(\log n)$ bits of precision, and reduces the condition number of any deterministic matrix to $O(n)$, matching Gaussian perturbations. Our result in particular implies a better complexity for the perturbed conjugate gradient algorithm, showing that we can solve an $n\times n$ linear system in linear space to within an arbitrarily small constant backward error using $O(n)$ matrix-vector products. In our construction, we introduce the concept of a pattern matrix, which is a dense deterministic matrix that maps all sparse vectors into dense vectors, and we combine it with a sparse perturbation whose entries are dependent and located in a non-uniform fashion. In order to analyze this construction, we develop new techniques for lower bounding the smallest singular value of a random matrix with dependent entries.

artificial intelligence, machine learning, matrix, (18 more...)

arXiv.org Machine Learning

2604.23193

Country: North America > United States > Michigan (0.27)

Genre: Research Report > New Finding (0.34)

Technology: