AITopics | sde

Adam Reduces a Unique Form of Sharpness: Theoretical Insights Near the Minimizer Manifold

Neural Information Processing SystemsJun-21-2026, 10:48:17 GMT

Despite the popularity of the Adam optimizer in practice, most theoretical analyses study Stochastic Gradient Descent (SGD) as a proxy for Adam, and little is known about how the solutions found by Adam differ. In this paper, we show that Adam implicitly reduces a unique form of sharpness measure shaped by its adaptive updates, leading to qualitatively different solutions from SGD. More specifically, when the training loss is small, Adam wanders around the manifold of minimizers and takes semi-gradients to minimize this sharpness measure in an adaptive manner, a behavior we rigorously characterize through a continuous-time approximation using stochastic differential equations. We further demonstrate how this behavior differs from that of SGD in a well-studied setting: when training overparameterized models with label noise, SGD has been shown to minimize the trace of the Hessian matrix, tr(H), whereas we prove that Adam minimizes tr(Diag(H)1/2) instead. In solving sparse linear regression with diagonal linear networks, this distinction enables Adam to achieve better sparsity and generalization than SGD. Finally, our analysis framework extends beyond Adam to a broad class of adaptive gradient methods, including RMSProp, Adam-mini, Adalayer and Shampoo, and provides a unified perspective on how these adaptive optimizers reduce sharpness, which we hope will offer insights for future optimizer design.

implicit bias, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country: North America > United States > Minnesota (0.27)

Genre:

Research Report > Experimental Study (0.92)
Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)

Add feedback

Integration Matters for Learning PDEs with Backward SDEs

Neural Information Processing SystemsJun-17-2026, 02:03:07 GMT

Backward stochastic differential equation (BSDE)-based deep learning methods provide an alternative to Physics-Informed Neural Networks (PINNs) for solving high-dimensional partial differential equations (PDEs), offering potential algorithmic advantages in settings such as stochastic optimal control, where the PDEs of interest are tied to an underlying dynamical system. However, standard BSDEbased solvers have empirically been shown to underperform relative to PINNs in the literature. In this paper, we identify the root cause of this performance gap as a discretization bias introduced by the standard Euler-Maruyama (EM) integration scheme applied to one-step self-consistency BSDE losses, which shifts the optimization landscape off target. We find that this bias cannot be satisfactorily addressed through finer step-sizes or multi-step self-consistency losses. To properly handle this issue, we propose a Stratonovich-based BSDE formulation, which we implement with stochastic Heun integration. We show that our proposed approach completely eliminates the bias issues faced by EM integration. Furthermore, our empirical results show that our Heun-based BSDE method consistently outperforms EM-based variants and achieves competitive results with PINNs across multiple high-dimensional benchmarks. Our findings highlight the critical role of integration schemes in BSDE-based PDE solvers, an algorithmic detail that has received little attention thus far in the literature.

artificial intelligence, discretization, machine learning, (19 more...)

Neural Information Processing Systems

Country: North America > United States (0.45)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A Diffusion Approximation for Temporal-Difference Learning with Linear Features under Markovian Noise

Forzo, M., Compagnoni, E. Monzio, Russo, A., Pacchiano, A.

arXiv.org Machine LearningJun-17-2026

Temporal difference (TD) learning with linear function approximation is a core method for policy evaluation. Its classical continuous-time description is an ordinary differential equation (ODE), which captures the asymptotic mean dynamics but neglects stochastic fluctuations determining the error floor. We introduce a stochastic differential equation (SDE) approximation for linear TD(0) under Markovian noise. The resulting model distinguishes the contraction dynamics governed by the projected Bellman operator from the influence of Markovian sampling. As a consequence, the model explains the constant-stepsize error floor through the interaction between Markovian long-run covariance and the contraction geometry of the projected Bellman operator.

adiffusion approximation, machine learning, reinforcement learning, (15 more...)

arXiv.org Machine Learning

2606.18183

Country: Europe (0.46)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Symbolic Density Estimation for Discrete Distributions

Liu, Ziwen, Li, Meng

arXiv.org Machine LearningMay-25-2026

Discrete probability laws underpin statistical modeling, yet the catalog of interpretable distributions has expanded only gradually through centuries of case-by-case mathematical derivations. We introduce symbolic density estimation (SDE), an unsupervised framework that automatically recovers closed-form probability mass functions by composing elementary analytic operations within a structured search space. Our method integrates domain-specific structural priors with evolutionary search and a validity-aware inference stage, and it extends to richer distribution families such as zero inflation and finite mixtures. To support systematic evaluation and future research, we contribute a benchmark dataset spanning a broad collection of commonly used discrete distributions. The proposed algorithm recovers all benchmark families with accurate parameter estimates. A real data application shows that it identifies concise and interpretable mixture models that improve goodness-of-fit over standard models.

artificial intelligence, evolutionary algorithm, machine learning, (18 more...)

arXiv.org Machine Learning

2605.21813

Country: North America > United States > New York (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)

Add feedback

A note on connections between the Föllmer process and the denoising diffusion probabilistic model

Koike, Yuta

arXiv.org Machine LearningMay-19-2026

The Föllmer process is a Brownian motion conditioned to have a pre-specified distribution at time 1. This process can be interpreted as an "augmented" time-compressed version of the reverse stochastic differential equation (SDE) for the denoising diffusion probabilistic model (DDPM). While this fact has been indirectly used to analyze DDPM sampling errors via discretization of the reverse SDE, connections between direct discretization of the Föllmer process and the DDPM sampler have not yet been fully explored. This note aims to clarify this point while surveying relevant results from existing work. We show that discretized Föllmer processes give natural hyper-parameter settings of the DDPM sampler. Moreover, this allows us to systematically recover state-of-the-art results on DDPM sampling error bounds with slight improvements.

artificial intelligence, machine learning, ollmer process, (16 more...)

arXiv.org Machine Learning

2605.1804

Country: Asia (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.84)

Add feedback

Latent SDEs on Homogeneous Spaces

Neural Information Processing SystemsMay-1-2026, 05:06:21 GMT

artificial intelligence, machine learning, time point, (14 more...)

Neural Information Processing Systems

Country: Europe (0.46)

Industry: Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

f2543511e5f4d4764857f9ad833a977d-Paper-Conference.pdf

Neural Information Processing SystemsApr-30-2026, 06:56:36 GMT

machine learning, natural language, restart, (19 more...)

Neural Information Processing Systems

Genre: Research Report (0.46)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

2983e3047c0c730d3b7c022584717f3f-Supplemental.pdf

Neural Information Processing SystemsApr-25-2026, 05:29:51 GMT

artificial intelligence, machine learning, real-world dataset, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.73)

Add feedback

Supplementary Material for " Noisy Recurrent Neural Networks " ANotation and Background

Neural Information Processing SystemsApr-25-2026, 05:28:28 GMT

We begin by introducing some notations that will be used in this SM.

artificial intelligence, classification margin, machine learning, (19 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.64)

Add feedback

Efficient Sampling on Riemannian Manifolds via Langevin MCMC

Neural Information Processing SystemsApr-25-2026, 04:02:33 GMT

We study the task of efficiently sampling from a Gibbs distribution dπ = e hdvolg over a Riemannian manifold M via (geometric) Langevin MCMC; this algorithm involves computing exponential maps in random Gaussian directions and is efficiently implementable in practice. The key to our analysis of Langevin MCMC is a bound on the discretization error of the geometric Euler-Murayama scheme, assuming his Lipschitz and M has bounded sectional curvature. Our error bound matches the error of Euclidean Euler-Murayama in terms of its stepsize dependence. Combined with a contraction guarantee for the geometric Langevin Diffusion under Kendall-Cranston coupling, we prove that the Langevin MCMC iterates lie within ε-Wasserstein distance of π after O(ε 2)steps, which matches the iteration complexity for Euclidean Langevin MCMC. Our results apply in general settings where hcan be nonconvex and M can have negative Ricci curvature. Under additional assumptions that the Riemannian curvature tensor has bounded derivatives, and that π satisfies a CD(,) condition, we analyze the stochastic gradient version of Langevin MCMC, and bound its iteration complexity by O(ε 2)as well.

artificial intelligence, machine learning, theorem 1, (18 more...)

Neural Information Processing Systems

Genre: Research Report (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Filters

Collaborating Authors

sde

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Adam Reduces a Unique Form of Sharpness: Theoretical Insights Near the Minimizer Manifold

Integration Matters for Learning PDEs with Backward SDEs

A Diffusion Approximation for Temporal-Difference Learning with Linear Features under Markovian Noise

Symbolic Density Estimation for Discrete Distributions

A note on connections between the Föllmer process and the denoising diffusion probabilistic model

Latent SDEs on Homogeneous Spaces

f2543511e5f4d4764857f9ad833a977d-Paper-Conference.pdf

2983e3047c0c730d3b7c022584717f3f-Supplemental.pdf

Supplementary Material for " Noisy Recurrent Neural Networks " ANotation and Background

Efficient Sampling on Riemannian Manifolds via Langevin MCMC