AITopics

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.54)

Neural Information Processing SystemsMay-1-2026, 06:27:05 GMT

4cf0ed8641cfcbbf46784e620a0316fb-Paper.pdf

algorithm, artificial intelligence, machine learning, (16 more...)

Country:

North America > United States (0.28)
North America > Canada > Quebec > Montreal (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.70)

Neural Information Processing SystemsFeb-9-2026, 01:46:35 GMT

4+3PhasesofCompute-OptimalNeuralScalingLaws

Wefurthermore derive, with mathematical proof and extensive numerical evidence, the scalinglawexponents inallofthese phases, inparticular computing theoptimal modelparameter-count as a function of floating point operation budget.

artificial intelligence, fpp, machine learning, (15 more...)

Country:

North America > Canada > Quebec (0.04)
North America > United States (0.04)
Europe > United Kingdom (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsFeb-8-2026, 14:05:18 GMT

4cf0ed8641cfcbbf46784e620a0316fb-Paper.pdf

algorithm, arxiv preprint arxiv, sgd, (14 more...)

Country:

North America > Canada > Quebec > Montreal (0.14)
North America > United States > New York (0.04)
North America > Canada > Ontario > Toronto (0.04)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.70)

Lang, Quanjun, Lu, Jianfeng

Error Analysis of Generalized Langevin Equations with Approximated Memory Kernels

arXiv.org Machine LearningDec-12-2025

We analyze prediction error in stochastic dynamical systems with memory, focusing on generalized Langevin equations (GLEs) formulated as stochastic Volterra equations. We establish that, under a strongly convex potential, trajectory discrepancies decay at a rate determined by the decay of the memory kernel and are quantitatively bounded by the estimation error of the kernel in a weighted norm. Our analysis integrates synchronized noise coupling with a Volterra comparison theorem, encompassing both subexponential and exponential kernel classes. For first-order models, we derive moment and perturbation bounds using resolvent estimates in weighted spaces. For second-order models with confining potentials, we prove contraction and stability under kernel perturbations using a hypocoercive Lyapunov-type distance. This framework accommodates non-translation-invariant kernels and white-noise forcing, explicitly linking improved kernel estimation to enhanced trajectory prediction. Numerical examples validate these theoretical findings.

equation, kernel, theorem 2, (16 more...)

2512.10256

Country:

North America > United States > North Carolina > Durham County > Durham (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report > New Finding (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Kernel Methods (0.34)

Neural Information Processing SystemsAug-19-2025, 18:02:17 GMT

efcb76ac1df9231a24893a957fcb9001-Paper-Conference.pdf

artificial intelligence, deep learning, machine learning, (17 more...)

Country:

North America > Canada > Quebec > Montreal (0.15)
North America > United States > New York (0.04)
Europe > Russia (0.04)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Ferbach, Damien, Everett, Katie, Gidel, Gauthier, Paquette, Elliot, Paquette, Courtney

Dimension-adapted Momentum Outscales SGD

arXiv.org Machine LearningMay-23-2025

We investigate scaling laws for stochastic momentum algorithms with small batch on the power law random features model, parameterized by data complexity, target complexity, and model size. When trained with a stochastic momentum algorithm, our analysis reveals four distinct loss curve shapes determined by varying data-target complexities. While traditional stochastic gradient descent with momentum (SGD-M) yields identical scaling law exponents to SGD, dimension-adapted Nesterov acceleration (DANA) improves these exponents by scaling momentum hyperparameters based on model size and data complexity. This outscaling phenomenon, which also improves compute-optimal scaling behavior, is achieved by DANA across a broad range of data and target complexities, while traditional methods fall short. Extensive experiments on high-dimensional synthetic quadratics validate our theoretical predictions and large-scale text experiments with LSTMs show DANA's improved loss exponents over SGD hold in a practical setting.

large language model, machine learning, natural language, (20 more...)

2505.16098

Country:

North America > Canada > Quebec > Montreal (0.13)
Asia > Middle East > Jordan (0.04)
North America > United States > New York (0.04)
(5 more...)

Genre: Research Report > New Finding (0.45)

Industry: Energy (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.92)

Paquette, Elliot, Paquette, Courtney, Xiao, Lechao, Pennington, Jeffrey

4+3 Phases of Compute-Optimal Neural Scaling Laws

arXiv.org Machine LearningMay-23-2024

We consider the three parameter solvable neural scaling model introduced by Maloney, Roberts, and Sully. The model has three parameters: data complexity, target complexity, and model-parameter-count. We use this neural scaling model to derive new predictions about the compute-limited, infinite-data scaling law regime. To train the neural scaling model, we run one-pass stochastic gradient descent on a mean-squared loss. We derive a representation of the loss curves which holds over all iteration counts and improves in accuracy as the model parameter count grows. We then analyze the compute-optimal model-parameter-count, and identify 4 phases (+3 subphases) in the data-complexity/target-complexity phase-plane. The phase boundaries are determined by the relative importance of model capacity, optimizer noise, and embedding of the features. We furthermore derive, with mathematical proof and extensive numerical evidence, the scaling-law exponents in all of these phases, in particular computing the optimal model-parameter-count as a function of floating point operation budget.

compute-optimal curve, proposition 10, volterra equation, (15 more...)

2405.15074

Country:

North America > Canada > Quebec > Montreal (0.04)
North America > United States > New York > Nassau County > Mineola (0.04)

Genre: Research Report (0.49)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)

Lang, Quanjun, Lu, Jianfeng

Learning Memory Kernels in Generalized Langevin Equations

arXiv.org Machine LearningFeb-18-2024

We introduce a novel approach for learning memory kernels in Generalized Langevin Equations. This approach initially utilizes a regularized Prony method to estimate correlation functions from trajectory data, followed by regression over a Sobolev norm-based loss function with RKHS regularization. Our approach guarantees improved performance within an exponentially weighted $L^2$ space, with the kernel estimation error controlled by the error in estimated correlation functions. We demonstrate the superiority of our estimator compared to other regression estimators that rely on $L^2$ loss functions and also an estimator derived from the inverse Laplace transform, using numerical examples that highlight its consistent advantage across various weight parameter selections. Additionally, we provide examples that include the application of force and drift terms in the equation.

equation, laplace transform, loss function, (14 more...)

2402.11705

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Kernel Methods (0.34)