AITopics

2605.26654

Country: Asia (0.27)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Malaxechebarría, Begoña García, Paquette, Courtney, Fazel, Maryam, Drusvyatskiy, Dmitriy

High-dimensional Limit of SGD for Diagonal Linear Networks

arXiv.org Machine LearningMay-19-2026

Understanding the behavior of stochastic gradient methods is a central problem in modern machine learning. Recent work has highlighted diagonal linear networks as a simplified yet expressive setting for analyzing the optimization and generalization properties of neural models. In this work, we show that in the high-dimensional regime, stochastic gradient descent on diagonal linear networks is well-approximated by continuous dynamics governed by a stochastic differential equation (SDE), which explicitly decouples the drift from the gradient noise. We further derive a deterministic partial differential equation whose solution propagates the relevant state of the iterates and characterizes the time evolution of a broad class of observable statistics, including the risk, curvature, and other metrics for optimality. Finally, we show that, under a suitable parametrization, the stochastic dynamics are globally well posed and converge exponentially fast to zero risk with high probability, yielding a fully explicit non-asymptotic description of their long-time behavior. Numerical simulations corroborate our theoretical findings.

artificial intelligence, deep learning, machine learning, (15 more...)

2605.17177

Country: North America > United States > New York (0.27)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.74)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.45)

Nicoletti, Flavio, Ma, Chenxiao, Ventura, Enrico, Saglietti, Luca, Mannelli, Stefano Sarao

The Interplay of Data Structure and Imbalance in the Learning Dynamics of Diffusion Models

arXiv.org Machine LearningMay-8-2026

Real-world datasets are inherently heterogeneous, yet how per-class structural differences and sampling imbalance shape the training dynamics of diffusion models-and potentially exacerbate disparities-remains poorly understood. While models typically transition from an initial phase of generalization to memorizing the training set, existing theory assumes homogeneous data, leaving open how class imbalance and heterogeneity reshape these dynamics. In this work, we develop a high-dimensional analytical framework to study class-dependent learning in score-based diffusion models. Analyzing a random-features model trained on Gaussian mixtures, we derive the feature-covariance spectrum to characterize per-class generalization and memorization times. We reveal the explicit hierarchy governing these dynamics: class variance is the primary determinant of learning order-consistently favoring higher-variance classes-while centroid geometry plays a secondary role. Sampling imbalance acts as a modulator that can reverse this ordering and, under strong imbalance, forces minority classes to acquire distinct, delayed speciation times during backward diffusion. Together, these results suggest that diffusion models can memorize some classes while others remain insufficiently learned. We validate our theoretical predictions empirically using U-Net models trained on Fashion MNIST.

artificial intelligence, data structure, machine learning, (14 more...)

2605.06367

Country: Europe > Italy (0.28)

Genre: Research Report > New Finding (0.87)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Omiya, Gaku, Poirion, Pierre-Louis, Takeda, Akiko

Randomized Subspace Nesterov Accelerated Gradient

arXiv.org Machine LearningMay-4-2026

Randomized-subspace methods reduce the cost of first-order optimization by using only low-dimensional projected-gradient information, a feature that is attractive in forward-mode automatic differentiation and communication-limited settings. While Nesterov acceleration is well understood for full-gradient and coordinate-based methods, obtaining accelerated methods for general subspace sketches that use only projected-gradient information and can improve over full-dimensional Nesterov acceleration in oracle complexity is technically nontrivial. We develop randomized-subspace Nesterov accelerated gradient methods for smooth convex and smooth strongly convex optimization under matrix smoothness and generic sketch moment assumptions. The key technical ingredient is a three-sequence formulation tailored to matrix smoothness, which recovers the corresponding classical Nesterov methods in the full-dimensional case. The resulting theory establishes accelerated oracle-complexity guarantees and makes explicit how matrix smoothness and the sketch distribution enter the complexity. It also provides a unified basis for comparing sketch families and identifying when randomized-subspace acceleration improves over full-dimensional Nesterov acceleration in oracle complexity.

artificial intelligence, machine learning, sketch, (17 more...)

2605.0074

Country:

Asia > Japan (0.28)
North America > United States (0.28)

Genre: Research Report (0.83)

Industry: Health & Medicine (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)

Neural Information Processing SystemsMay-1-2026, 01:41:33 GMT

Supplementary to Smooth Bilevel Programming for Sparse Regularization Clarice Poon, Gabriel Peyré APseudocode for gradient descent implementation

Note that f(βt) = gt is computed either as in line 5 or line 9 of the algorithm and one can use these computations for any gradient based algorithm (e.g. Note also that this is simply gradient descent on a smooth function, and one can apply typical methods to choosing the stepsize γk, such as the Barzilai-Borwein stepsize [Barzilai and Borwein, 1988]. Algorithm 1: Gradient descent implementation of Ncvx-Pro for solving Lasso. 1 initialization v0 Rn (with no zero entries), stepsize γt > 0; Result: βt 2 while not converged do 3 if n6 mand λ>0 then 4 ut = diag(vt)X>Xdiag(vt) + λId To show that i) implies ii), recall that a convex, proper and lower semicontinuous function ϕ can be written in terms of its convex conjugate which has domain Rd . For the expression of ψwhen Ris a norm,from the above, we know that ψ = ( ϕ) ( z), and recall that for any norm, R(β) = maxR (w)61hw, βi. We derive some properties of the function h: Lemma 1.

algorithm, artificial intelligence, machine learning, (14 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.81)

Neural Information Processing SystemsApr-29-2026, 19:19:25 GMT

Neural Circuits for Fast Poisson Compressed Sensing in the Olfactory Bulb

Within a single sniff, the mammalian olfactory system can decode the identity and concentration of odorants wafted on turbulent plumes of air. Yet, it must do so given access only to the noisy, dimensionally-reduced representation of the odor world provided by olfactory receptor neurons. As a result, the olfactory system must solve a compressed sensing problem, relying on the fact that only a handful of the millions of possible odorants are present in a given scene. Inspired by this principle, past works have proposed normative compressed sensing models for olfactory decoding. However, these models have not captured the unique anatomy and physiology of the olfactory bulb, nor have they shown that sensing can be achieved within the 100-millisecond timescale of a single sniff. Here, we propose a rate-based Poisson compressed sensing circuit model for the olfactory bulb.

artificial intelligence, machine learning, odorant, (18 more...)

Country: North America > United States > Massachusetts > Middlesex County (0.28)

Genre: Research Report > New Finding (0.67)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Neural Information Processing SystemsApr-25-2026, 18:24:44 GMT

4ae67a7dd7e491f8fb6f9ea0cf25dfdb-Supplemental.pdf

aa null, artificial intelligence, machine learning, (14 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.94)

Neural Information Processing SystemsApr-25-2026, 16:21:47 GMT

Supplementary Material: Memory-Efficient Approximation Algorithms for MAX-K-CUT and Correlation Clustering

Let ϑ Rd1 and µ Rd2 be the dual variables corresponding to the d1 equality constraints and the d2 inequality constraints respectively. Let X? be an optimal solution to (SDP) and let X?FW be an optimal solution to (SDP-LSE). For ease of notation, let u= A(1)(X) b(1) andv = b(2) A(2)(X), (1) and define (bu,bv), (uFW,vFW) and (u?,v?) by substituting bX, XFW and X? respectively in (1). Upper bound on the objective. Rearranging the terms, using the duality of the `1 and ` norms, and the fact that µ? 0, gives hC, bX i hC,X?i+

artificial intelligence, diag, machine learning, (17 more...)