AITopics | theorem

Collaborating Authors

theorem

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Attention as In-Context Empirical Bayes: A Two-Stage View via Particle Dynamics

Smart, Matthew, Ganguly, Soumya, Metya, Nilava, Morozov, Alexandre V., Sengupta, Anirvan M.

arXiv.org Machine LearningMay-29-2026

We study minimal attention-only transformers under all-token corruption and show they admit a two-stage empirical Bayes interpretation. A single attention step computes a kernel-weighted posterior mean with respect to the empirical distribution defined by the context. Depth refines this distribution through particle dynamics (Stage 1), while a long-range skip-connection carries the noisy input as a query for posterior inference (Stage 2), revealing distinct statistical roles for depth and attention residuals. The framework isolates a minimal setting in which the context itself induces a depth-dependent energy landscape governing in-context inference. We show that effective denoising can emerge without an explicit noise schedule: a fixed kernel bandwidth and finite integration horizon suffice, yielding a principled depth-noise relationship. We further establish a posterior-mean recovery guarantee for a class of well-behaved priors, where the empirical estimator converges to the Bayes-optimal predictor under asymptotic conditions. Connecting these dynamics to reverse-diffusion limits, our results provide a statistical interpretation of attention as in-context inference via sample-based posterior estimation, without explicit density modeling.

approximation, artificial intelligence, machine learning, (17 more...)

arXiv.org Machine Learning

2605.29351

Country:

North America > United States (0.46)
Europe (0.45)

Genre: Research Report > New Finding (0.34)

Industry: Health & Medicine (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)

Add feedback

The Fundamental Limits of Fraud Detection in Card Payment Networks

Dhama, Gaurav

arXiv.org Machine LearningMay-28-2026

Card payment fraud detection is usually framed as a supervised classification problem. Although this approach has generated practical progress, improvement has remained incremental despite major advances in model architecture. We argue that this is not mainly a failure of function approximation or optimization, but a consequence of structural information impairments inherent to the payment ecosystem. We formalize card authorization as a sequential decision problem with delayed, censored, corrupted, and counterfactually missing feedback. We derive a minimax regret lower bound showing that these impairments enter multiplicatively in the denominator of the achievable learning rate. The bound implies that improving issuer reporting quality or reducing censorship can yield larger reductions in the regret floor than increasing model complexity. We also show that heterogeneity across issuers worsens learnability beyond what average impairment rates suggest. The paper contributes a theory of why fraud detection in payment networks is fundamentally harder than in standard online learning settings, identifies ecosystem information quality as the key bottleneck, and provides a theoretical basis for prioritizing investments in reporting infrastructure, dispute process quality, and selective exploration. The paper is theory-first and does not rely on proprietary transaction data.

artificial intelligence, data mining, machine learning, (14 more...)

arXiv.org Machine Learning

2605.27557

Genre: Research Report (0.40)

Industry:

Law > Civil Rights & Constitutional Law (1.00)
Law Enforcement & Public Safety > Fraud (1.00)
Banking & Finance (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Smoothed Score Queries and the Complexity of Sampling

Liu, Jingbo

arXiv.org Machine LearningMay-28-2026

We study the query complexity of sampling from high-dimensional Gaussian distributions using gradient information. In the standard oracle model, exact gradients expose only matrix-vector products with the precision matrix, leading to polynomial approximation barriers and a characteristic $\sqrtκ$ dependence on the condition number. We show that this barrier disappears when the sampler is allowed to query \emph{smoothed scores}, namely gradients of the logarithms of the Gaussian-convolved densities. For a Gaussian target with precision matrix $Λ$, a smoothed-score query at noise level $τ$ gives access to the resolvent $(Λ+τ^{-1}I)^{-1}$. Combining geometrically spaced noise levels with sinc-quadrature rational approximation, we obtain a sampler with $q=O\!\left(\bigl(\logκ+\log(e\sqrt d/δ_{\rm TV})\bigr)\log(e\sqrt d/δ_{\rm TV})\right)$ smoothed-score queries for total variation error $δ_{\rm TV}$, improving the condition-number dependence from $\sqrtκ$ to logarithmic. We also study finite-bit gradient oracles. Using coordinatewise quantization of the transformed smoothed-score answers and a final dithering step, we obtain a sampling scheme whose total communicated gradient information is polylogarithmic in $κ$; in particular, for fixed dimension and accuracy, the bit complexity is $O(\log^2κ)$. To complement these upper bounds, we introduce a channel-synthesis, or reverse-Shannon, converse technique for sampling lower bounds. This converts total-variation simulation guarantees into communication requirements and yields an $Ω(\logκ)$ lower bound on the required gradient information. Together, these results identify smoothed scores as a provably more informative oracle for sampling and give nearly matching upper and lower bounds for its finite-bit complexity.

approximation, artificial intelligence, machine learning, (19 more...)

arXiv.org Machine Learning

2605.27769

Country: North America > United States (0.46)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Gaussian Processes with Sample Paths in Reproducing Kernel Banach Spaces

Karvonen, Toni, Sørensen, Rasmus Kleist Hørlyck

arXiv.org Machine LearningMay-28-2026

We investigate the connection between Gaussian processes and Gaussian random elements in reproducing kernel Banach spaces. We show that the covariance operator of a weak second-order Radon probability measure on such a space is uniquely determined by a positive definite function. In the Gaussian case, we characterize those positive definite functions that arise from covariance operators in terms of $γ$-radonifying operators. Building on these results, we extend the classical Driscoll theorem to the Banach space setting.

artificial intelligence, machine learning, operator, (17 more...)

arXiv.org Machine Learning

2605.28106

Country:

North America > United States (0.28)
Europe > Finland (0.28)
Europe > Denmark (0.28)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.68)

Add feedback

Nonparametric Instrumental Variable Analysis Without Structural Equations: Debiased Inference on Functionals of Inverse Problems with No Solutions

Shen, Zikai, Kallus, Nathan, Meunier, Dimitri, Zenati, Houssam, Gretton, Arthur, Bibaut, Aurélien

arXiv.org Machine LearningMay-27-2026

Instrumental variable (IV) analyses generally start by posing a structural equation: Y = hstructural(X)+ϵ, (1) where hstructural represents the causal effect of X on Y, and X and ϵ may be endogenous (E[ϵ | X] = 0). Then given an exogenous instrument Z satisfying the exclusion restriction, the common statistical solution given joint observations of W = (X,Y,Z) P is to conduct inference on some continuous linear functional h 7 EP[m(W;h)] of a solution h H to the linear equation implied by exclusion: TPh = rP, (2) where TP: H G maps h 7 argming GEP(h(X) g(Z))2, rP = argminr GEP(Y r(Z))2, and H, G are closed linear subspaces of square-integrable functions of X and of Z, respectively. For example, if these are all square-integrable functions, then (TPh)(Z) = EP[h(X) | Z] is the conditional expectation.

artificial intelligence, kernel, machine learning, (19 more...)

arXiv.org Machine Learning

2604.2466

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Shallow ReLU$^s$ Networks in $L^p$-Type and Sobolev Spaces: Approximation and Path-Norm Controlled Generalization

Li, Weizhao, Liu, Fanghui, Shi, Lei

arXiv.org Machine LearningMay-27-2026

Deep learning has shown remarkable effectiveness in high-dimensional approximation problems, particularly in scientific computing, inverse problems, and operator learning (Han et al., 2018; Adcock et al., 2022; Beck et al., 2023). In many such settings, the ReLUs activation σs(t) = max{0,t}s (s N0) is especially relevant because it yields piecewisepolynomial representations that are well suited to smooth targets and derivative-sensitive tasks (Yang and Zhou, 2025; He et al., 2024).

artificial intelligence, barron space, machine learning, (18 more...)

arXiv.org Machine Learning

2605.18468

Country: Asia > China (0.14)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

Adaptive Calibration in Non-Stationary Environments

Liu, Junyan, Luo, Haipeng, Ratliff, Lillian J.

arXiv.org Machine LearningMay-25-2026

Making calibrated online predictions is a central challenge in modern AI systems. Much of the existing literature focuses on fully adversarial environments where outcomes may be arbitrary, leading to conservative algorithms that can perform suboptimally in more benign settings, such as when outcomes are nearly stationary. This gap raises a natural question: can we design online prediction algorithms whose calibration error automatically adapts to the degree of non-stationarity in the environment, smoothly interpolating between i.i.d. and adversarial regimes? We answer this question in the affirmative and develop a suite of algorithms that achieve adaptive calibration guarantees under multiple calibration measures. Specifically, with $T$ being the number of rounds, $K$ being the unknown number of i.i.d. segments of the environment, and $C\in[0,T]$ being another unknown non-stationary measure defined as the minimal $\ell_1$ deviation of the mean outcomes, our algorithms attain $\widetilde{O}(\min\{\sqrt{T}+(TC)^{\frac{1}{3}}, \sqrt{KT}\})$ for $\ell_1$ calibration error and $\widetilde{O}(\min\{(1+C)^{\frac{1}{3}}, K\})$ for both $\ell_2$ and pseudo KL calibration error. These bounds match the optimal rates in the stationary case ($C=0$ and $K=1$) and recover known guarantees in the fully adversarial regime ($C, K=Ω(T)$). Our approach builds on and extends prior work [Hu et al., 2026, Luo et al., 2025], introducing an epoch-based scheduling together with a novel non-uniform partition of the prediction space that allocates finer resolution near the underlying ground truth.

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Machine Learning

2605.1149

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

The Attribution Impossibility: No Feature Ranking Is Faithful, Stable, and Complete Under Collinearity

Caraker, Drake, Arnold, Bryan, Rhoads, David

arXiv.org Machine LearningMay-22-2026

No feature ranking can be simultaneously faithful, stable, and complete when features are collinear. For collinear pairs, ranking reduces to a coin flip. We prove this impossibility, quantify it for four model classes, resolve it via ensemble averaging (DASH), and machine-verify it with 305 Lean 4 theorems. We characterize the complete attribution design space: exactly two families of methods exist -- faithful-complete methods (unstable, with rankings that flip up to 50% of the time) and ensemble methods like DASH (stable, reporting ties for symmetric features) -- and no method lies outside this dichotomy. The impossibility is quantitative: the attribution ratio diverges as 1/(1-rho^2) for gradient boosting, is infinite for Lasso, and converges for random forests. DASH (Diversified Aggregation of SHAP) is provably Pareto-optimal among unbiased aggregations, achieving the Cramer-Rao variance bound with a tight ensemble size formula. In a survey of 77 public datasets, 68% exhibit attribution instability. Switching to conditional SHAP does not escape the impossibility when features have equal causal effects. The framework includes practical diagnostics -- a Z-test workflow and single-model screening tool -- and has direct consequences for fairness auditing: SHAP-based proxy discrimination audits are provably unreliable under collinearity. The design space theorem, diagnostics, and impossibility are mechanically verified in Lean 4 (305 theorems from 16 axioms, 0 sorry) -- to our knowledge, the first formally verified impossibility in explainable AI.

artificial intelligence, instability, machine learning, (13 more...)

arXiv.org Machine Learning

doi: 10.5281/zenodo.19468379

2605.21492

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.45)

Industry:

Banking & Finance (0.67)
Health & Medicine > Therapeutic Area (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.86)

Add feedback

Intrinsic Wasserstein Rates for Score-Based Generative Models on Smooth Manifolds

Fu, Guoji, Suzuki, Taiji, Lee, Wee Sun, Nitanda, Atsushi

arXiv.org Machine LearningMay-18-2026

Score-based generative models are trained in high-dimensional ambient spaces, yet many data distributions are supported on low-dimensional nonlinear structures. We prove that, for compact $d$-dimensional smooth manifolds $\mathcal{M} \subset [0,1]^D$ with $d > 2$ and $β$-Hölder densities strictly positive on $\mathcal{M}$, a variance-preserving SGM estimator attains the intrinsic Wasserstein--1 sample exponent $\tilde{\mathcal{O}}(D^{\mathcal{O}_β(d)}n^{-(β+1)/(d+2β)})$, up to logarithmic factors and explicit geometry and density factors. The full nonasymptotic bound explicitly isolates the finite-order geometry envelope, Hölder radius, density lower bound, ambient dependence, and finite-order correction terms. The analysis separates score approximation into a large-noise tangent-cell regime and a small-noise projection-centered, de-Gaussianized Laplace regime. The key technical ingredient is a ReLU implementation of nearest-projection coordinates via finite intrinsic anchors and Gauss--Newton iterations, rather than approximating the manifold projection as a black-box high-dimensional smooth map. Consequently, for families with polynomially controlled geometry and density lower bounds, the constructed score-network parameters have polynomial ambient dependence.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Machine Learning

2605.15822

Country:

Europe (1.00)
North America > United States (0.92)

Genre: Research Report (0.81)

Industry:

Energy (0.45)
Transportation (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.92)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.92)
Information Technology > Artificial Intelligence > Natural Language > Generation (0.60)

Add feedback

The Privacy Price of Tail-Risk Learning: Effective Tail Sample Size in Differentially Private CVaR Optimization

Mansouri, El Mustapha

arXiv.org Machine LearningMay-18-2026

Differential privacy changes the effective sample size governing CVaR learning. For tail mass $τ$, the privacy-relevant sample size is not $n$, but $nτ$; equivalently, the effective private tail sample size is $εnτ$. Private CVaR excess risk decomposes into ordinary tail-risk statistical error and a privacy price. This decomposition is complete for scalar estimation and finite classes: scalar estimation has rate $Θ(B \min\{1,(nτ)^{-1/2}+(εnτ)^{-1}\})$, and finite classes of size $M$ have rate $Θ(B \min\{1,\sqrt{\log(2M)/(nτ)}+\log(2M)/(εnτ)\})$. These complete rates hold under pure DP, and their lower bounds extend to approximate DP in the stated small-$δ$ regimes. For convex Lipschitz learning, modular upper and lower reductions show that the CVaR-specific privacy term necessarily scales as $1/(εnτ)$, with dimension dependence inherited from private stochastic convex optimization. Together, these results identify ordinary private learning on $Θ(nτ)$ informative tail records as the canonical hard subproblem inside private CVaR learning.

artificial intelligence, cvar, machine learning, (15 more...)

arXiv.org Machine Learning

2605.16219

Country: Asia > Japan > Honshū > Kantō (0.28)

Genre: Research Report (0.50)

Industry: Information Technology > Security & Privacy (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.68)

Add feedback