AITopics | ode

Dynamics of Stochastic Momentum Methods on Large-scale, Quadratic Models Supplementary material

Neural Information Processing SystemsMay-1-2026, 06:27:07 GMT

The appendix is organized into five sections as follows: 1. Appendix A derives the Volterra equation and proves the main result for the homogenized SGD (Theorem 1). 2. We show in Appendix B a heuristic derivation of the homogenized SGD approximation to the SDA class of algorithms on the least squares problem and we show that SGD and homogenized SGD are close under orthogonal invariance (Theorem 2). 3. We give in Appendix C a general overview of the analysis of a convolution Volterra equation of the type that arises in the SDA class. Unless otherwise stated, all the results hold under Assumptions 1 and 2. We include all statements from the previous sections for clarity. The results presented in this paper concern the analysis of existing methods and a new method that is a variant of an existing method. The results are theoretical and we do not anticipate any direct ethical and societal issues. We believe the results will be used by machine learning practitioners and we encourage them to use it to build a more just, prosperous world. A.1 Homogenized SGD We recall that the diffusion model is given by dXt = 2 dZt 1 To connect these diffusions to SGD on the least squares problem (2.1) f(x)= 1 2 kAx bk2, we will use the singular value decomposition of U VT of A. We order the singular values 1 2 3 in decreasing order. We then let t = VT(Xt ex), where we recall that b = Aex+ . We may do a similar computation with N and conclude that: J(1) = 2 2 2jJ 2 1 '(t) '(s)d s,j In summary, we may express J in terms of N by J(1) = 2 2 2jJ 1 '2(t) N(1) + 22 dh t,jiwith J(0) = EH When (k,n)= k+n and thus '(t)=(1+ t) with (t)= 1+t, the corresponding ODE is precisely bJ(3) The other case is when (k,n)= n, or '(t)=exp( t). We call this the general SDAHB; one recovers SDAHB when 1 =, 2 =0, and = .

artificial intelligence, equation, machine learning, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.54)

Add feedback

f2543511e5f4d4764857f9ad833a977d-Paper-Conference.pdf

Neural Information Processing SystemsApr-30-2026, 06:56:36 GMT

machine learning, natural language, restart, (19 more...)

Neural Information Processing Systems

Genre: Research Report (0.46)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

299a08ee712d4752c890938da99a77c6-Paper-Conference.pdf

Neural Information Processing SystemsApr-25-2026, 05:28:58 GMT

artificial intelligence, equilibrium point, machine learning, (18 more...)

Neural Information Processing Systems

Country: Asia > China (0.14)

Genre: Research Report (0.94)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.69)

Add feedback

AVariational Perspective on High-Resolution ODEs

Neural Information Processing SystemsApr-24-2026, 07:54:10 GMT

We consider unconstrained minimization of smooth convex functions. We propose a novel variational perspective using forced Euler-Lagrange equation that allows for studying high-resolution ODEs. Through this, we obtain a faster convergence rate for gradient norm minimization using Nesterov's accelerated gradient method. Additionally, we show that Nesterov's method can be interpreted as a ratematching discretization of an appropriately chosen high-resolution ODE. Finally, using the results from the new variational perspective, we propose a stochastic method for noisy gradients.

algorithm, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Country: Asia (0.47)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Two excellent new sci-fi novels tackle robots in very different ways

New ScientistApr-8-2026, 18:00:26 GMT

Luminous by Silvia Park and Ode to the Half-Broken by Suzanne Palmer are both thoughtful and well-written science fiction novels, featuring robots in richly realised worlds. But there the similarities end, says Emily H. Wilson Do we relate better to stories about robots with faces and bodies? Robots and whether they will one day deserve to be treated like people - or destroy humanity, or both - have interested writers for well over a century now. In the real world, the robot threat appears to involve the uses of artificial intelligence in misinformation and more direct forms of warfare such as drone attacks. In the world of literature, however, many writers focus on individual robots.

artificial intelligence, close advertisement skip, robot, (12 more...)

New Scientist

Country:

Europe > Ukraine > Kyiv Oblast > Chernobyl (0.05)
Asia > Middle East > Iran (0.05)

Industry:

Marketing (0.42)
Media (0.35)
Transportation (0.31)

Technology: Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (0.35)

Add feedback

Adaptive Averaging in Accelerated Descent Dynamics

Walid Krichene, Alexandre Bayen, Peter L. Bartlett

Neural Information Processing SystemsMar-23-2026, 02:33:53 GMT

We study accelerated descent dynamics for constrained convex optimization. This dynamics can be described naturally as a coupling of a dual variable accumulating gradients at a given rate η(t), and a primal variable obtained as the weighted average of the mirrored dual trajectory, with weights w(t). Using a Lyapunov argument, we give sufficient conditions on η and wto achieve a desired convergence rate. As an example, we show that the replicator dynamics (an example of mirror descent on the simplex) can be accelerated using a simple averaging scheme. We then propose an adaptive averaging heuristic which adaptively computes the weights to speed up the decrease of the Lyapunov function. We provide guarantees on adaptive averaging in continuous-time, prove that it preserves the quadratic convergence rate of accelerated first-order methods in discrete-time, and give numerical experiments to compare it with existing heuristics, such as adaptive restarting. The experiments indicate that adaptive averaging performs at least as well as adaptive restarting, with significant improvements in some cases.

artificial intelligence, machine learning, trajectory, (18 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.46)

Add feedback

On the Curved Geometry of Accelerated Optimization

Aaron Defazio

Neural Information Processing SystemsFeb-14-2026, 19:36:14 GMT

Neural Information Processing Systems http://nips.cc/

gradient method, manifold, proximal point method, (13 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.04)
North America > Canada (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

cab070d53bd0d200746fb852a922064a-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-14-2026, 03:04:37 GMT

final version, generalisation error, reviewer, (8 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.36)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.30)

Add feedback

Direct Runge-Kutta Discretization Achieves Acceleration

Jingzhao Zhang, Aryan Mokhtari, Suvrit Sra, Ali Jadbabaie

Neural Information Processing SystemsFeb-12-2026, 17:26:44 GMT

We study gradient-based optimization methods obtained by directly discretizing a second-order ordinary differential equation (ODE) related to the continuous limit of Nesterov's accelerated gradient method.

artificial intelligence, integrator, optimization problem, (18 more...)

Neural Information Processing Systems

Country: