AITopics | sup 0

Collaborating Authors

sup 0

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Attention as In-Context Empirical Bayes: A Two-Stage View via Particle Dynamics

Smart, Matthew, Ganguly, Soumya, Metya, Nilava, Morozov, Alexandre V., Sengupta, Anirvan M.

arXiv.org Machine LearningMay-29-2026

We study minimal attention-only transformers under all-token corruption and show they admit a two-stage empirical Bayes interpretation. A single attention step computes a kernel-weighted posterior mean with respect to the empirical distribution defined by the context. Depth refines this distribution through particle dynamics (Stage 1), while a long-range skip-connection carries the noisy input as a query for posterior inference (Stage 2), revealing distinct statistical roles for depth and attention residuals. The framework isolates a minimal setting in which the context itself induces a depth-dependent energy landscape governing in-context inference. We show that effective denoising can emerge without an explicit noise schedule: a fixed kernel bandwidth and finite integration horizon suffice, yielding a principled depth-noise relationship. We further establish a posterior-mean recovery guarantee for a class of well-behaved priors, where the empirical estimator converges to the Bayes-optimal predictor under asymptotic conditions. Connecting these dynamics to reverse-diffusion limits, our results provide a statistical interpretation of attention as in-context inference via sample-based posterior estimation, without explicit density modeling.

approximation, artificial intelligence, machine learning, (17 more...)

arXiv.org Machine Learning

2605.29351

Country:

North America > United States (0.46)
Europe (0.45)

Genre: Research Report > New Finding (0.34)

Industry: Health & Medicine (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)

Add feedback

High-dimensional Limit of SGD for Diagonal Linear Networks

Malaxechebarría, Begoña García, Paquette, Courtney, Fazel, Maryam, Drusvyatskiy, Dmitriy

arXiv.org Machine LearningMay-19-2026

Understanding the behavior of stochastic gradient methods is a central problem in modern machine learning. Recent work has highlighted diagonal linear networks as a simplified yet expressive setting for analyzing the optimization and generalization properties of neural models. In this work, we show that in the high-dimensional regime, stochastic gradient descent on diagonal linear networks is well-approximated by continuous dynamics governed by a stochastic differential equation (SDE), which explicitly decouples the drift from the gradient noise. We further derive a deterministic partial differential equation whose solution propagates the relevant state of the iterates and characterizes the time evolution of a broad class of observable statistics, including the risk, curvature, and other metrics for optimality. Finally, we show that, under a suitable parametrization, the stochastic dynamics are globally well posed and converge exponentially fast to zero risk with high probability, yielding a fully explicit non-asymptotic description of their long-time behavior. Numerical simulations corroborate our theoretical findings.

artificial intelligence, deep learning, machine learning, (15 more...)

arXiv.org Machine Learning

2605.17177

Country: North America > United States > New York (0.27)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.74)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.45)

Add feedback

Stochastic Scaling Limits and Synchronization by Noise in Deep Transformer Models

Agazzi, Andrea, Bruno, Giuseppe, García, Eloy Mosig, Saviozzi, Samuele, Romito, Marco

arXiv.org Machine LearningApr-30-2026

The transformer architecture [52], which underlies present-day Large Language Models, has been one of the main drivers of recent advances in machine learning and artificial intelligence. At each layer, the hidden state of the network is updated by sequentially applying two distinct operations: attention modules [3], which capture long-range interactions in the input sequence, and classical MultiLayer Perceptrons (MLPs), acting separately on each element of that sequence. Despite their empirical success, the mechanisms governing information propagation through depth, and the way attention and MLP blocks jointly shape internal representations, remain only partially understood from a theoretical viewpoint. Recent progress has come from viewing transformers in suitable scaling limits as deterministic mean-field interacting particle systems modeling the evolution of N tokens1 through the layers of the neural network architecture (the so-called residual stream dynamics), see, among others, [46, 26, 27, 45]. In these descriptions, depth plays the role of a continuous time variable, and, in the large-context regime (N), the evolution of token representations is encoded by a PDE for their empirical distribution. This viewpoint is closely connected to the literature on scaling laws, where the effect of various scaling exponents controlling the relative size of the network's hyperparameters (e.g., depth, width, context length) on the effective dynamics of the model

lemma 2, machine learning, natural language, (19 more...)

arXiv.org Machine Learning

2604.26898

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.54)

Add feedback

e379877d7880bb1f80c82a9f1c58e6e8-Paper-Conference.pdf

Neural Information Processing SystemsFeb-18-2026, 11:31:03 GMT

artificial intelligence, assumption, machine learning, (19 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Los Angeles County > Pasadena (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Greece (0.04)

Genre: Research Report > Experimental Study (0.92)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)

Add feedback

The Poisson Midpoint Method for Langevin Dynamics: Provably Efficient Discretization for Diffusion Models

Neural Information Processing SystemsFeb-15-2026, 23:58:21 GMT

LMC can suffer from slow convergence - requiring a large number of steps of small step-size to obtain good quality samples.

artificial intelligence, deep learning, machine learning, (17 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.04)
North America > United States (0.04)
Africa > Rwanda > Kigali > Kigali (0.04)

Genre:

Research Report > Experimental Study (0.92)
Workflow (0.67)

Industry: Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Vision (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.45)

Add feedback

equaltoz = z 1tonormalize andhea Student ' - t-distribp(z) = 8

Neural Information Processing SystemsFeb-10-2026, 11:13:41 GMT

Let w =( 1.5,0,..0) N(0,0.5) Denoting (25) utionofonwsameasthatof (26) eyobservationisthat.., Z1/2w k are Toseewhythisisthecase, wecanvectorizeeachterm: First, let' Lemma ForanyF :Rd R!R+, define problem 1,..., k, as : = su Next, let' 2021) Provingthe 31 Lf (w, b) C(w)2 n (49) tobetheleft(47)(wherethe ( (w),b)isused depends wonlythrough (w)).

artificial intelligence, machine learning, zhouetal, (19 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

A Organization of the Appendices

Neural Information Processing SystemsFeb-9-2026, 09:24:36 GMT

In the Appendix, we give proofs of all results from the main text. We say a function f: R Y! R is M -Lipschitz if for any y 2Y and ˆ y We can also define the Moreau envelope of a function f: R Y! R by The proof of all results in this section can be straightforwardly extended to these settings. Boyd et al. 2004; Bauschke, Combettes, et al. 2011; Rockafellar 1970), but is also useful and Interestingly, there is a similar equivalent characterization for Lipschitz functions as well. Finally, we show that any smooth loss is square-root-Lipschitz. Lipschitz losses is more general than the class of smooth losses studied in Srebro et al. 2010 .

artificial intelligence, machine learning, probability, (18 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.46)

Add feedback

TheHighLine: ExactRiskandLearningRateCurves ofStochasticAdaptiveLearningRateAlgorithms

Neural Information Processing SystemsFeb-7-2026, 19:03:00 GMT

We then investigate in detail two adaptivelearning rates-anidealized exactlinesearch andAdaGrad-Norm -on the least squares problem.

artificial intelligence, def, machine learning, (19 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > New York (0.04)
North America > Canada > Quebec (0.04)

Genre: Research Report (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Pathway to $O(\sqrt{d})$ Complexity bound under Wasserstein metric of flow-based models

Meng, Xiangjun, Wang, Zhongjian

arXiv.org Artificial IntelligenceDec-9-2025

We provide attainable analytical tools to estimate the error of flow-based generative models under the Wasserstein metric and to establish the optimal sampling iteration complexity bound with respect to dimension as $O(\sqrt{d})$. We show this error can be explicitly controlled by two parts: the Lipschitzness of the push-forward maps of the backward flow which scales independently of the dimension; and a local discretization error scales $O(\sqrt{d})$ in terms of dimension. The former one is related to the existence of Lipschitz changes of variables induced by the (heat) flow. The latter one consists of the regularity of the score function in both spatial and temporal directions. These assumptions are valid in the flow-based generative model associated with the Föllmer process and $1$-rectified flow under the Gaussian tail assumption. As a consequence, we show that the sampling iteration complexity grows linearly with the square root of the trace of the covariance operator, which is related to the invariant distribution of the forward process.

artificial intelligence, assumption 3, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2512.06702

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.92)

Add feedback

Deep Neural Operator Learning for Probabilistic Models

Bayraktar, Erhan, Feng, Qi, Zhang, Zecheng, Zhang, Zhaoyu

arXiv.org Artificial IntelligenceNov-11-2025

We propose a deep neural-operator framework for a general class of probability models. Under global Lipschitz conditions on the operator over the entire Euclidean space-and for a broad class of probabilistic models-we establish a universal approximation theorem with explicit network-size bounds for the proposed architecture. The underlying stochastic processes are required only to satisfy integrability and general tail-probability conditions. We verify these assumptions for both European and American option-pricing problems within the forward-backward SDE (FBSDE) framework, which in turn covers a broad class of operators arising from parabolic PDEs, with or without free boundaries. Finally, we present a numerical example for a basket of American options, demonstrating that the learned model produces optimal stopping boundaries for new strike prices without retraining.

artificial intelligence, machine learning, sup 0, (20 more...)

arXiv.org Artificial Intelligence

2511.07235

Country: North America > United States > California > Los Angeles County > Los Angeles (0.28)

Genre: Research Report (0.63)

Industry: Banking & Finance (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback