AITopics | spectral radius

Stochastic momentum methods such as heavy ball (HB), Nesterov momentum, and variants of Accelerated SGD (ASGD) [Kidambi et al., 2018] are widely used in modern training, but their stochastic benefits depend on two distinct quantities: serial runtime, the number of iterations needed to reach a target accuracy, and compute efficiency (CE), the inverse total gradient-query or FLOP cost. Larger batches reduce serial runtime without hurting CE only when the contraction gap grows linearly with batch size. We study stochastic HB and ASGD for consistent linear regression with Gaussian covariates and prove finite-dimensional, discrete-time lower bounds on their batch-size tradeoffs. Our first result shows that HB does not improve the CE frontier over SGD for arbitrary spectra; rather, it preserves SGD-level CE over a larger batch-size window, allowing larger batches to reduce serial runtime until HB reaches its deterministic accelerated scale. This window can be a factor $\sqrtκ$ larger than the SGD critical batch size. For ASGD, the picture is more spectrum-dependent: for rapidly decaying power-law spectra, ASGD improves small-batch CE over HB/SGD, but as batch size grows it trades this CE advantage for improved serial runtime. Synthetic linear-regression experiments verify these qualitative regimes, including near-overlap of ASGD and HB for slowly decaying spectra and the predicted CE--serial tradeoff for rapidly decaying spectra.

artificial intelligence, eigenvalue, machine learning, (15 more...)

arXiv.org Machine Learning

2606.19179

Country: Asia (0.28)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Revisiting Glorot Initialization for Long-Range Linear Recurrences

Neural Information Processing SystemsJun-17-2026, 05:46:14 GMT

Proper initialization is critical for Recurrent Neural Networks (RNNs), particularly in long-range reasoning tasks, where repeated application of the same weight matrix can cause vanishing or exploding signals. A common baseline for linear recurrences is Glorot initialization, designed to ensure stable signal propagation--but derived under the infinite-width, fixed-length regime--an unrealistic setting for RNNs processing long sequences. In this work, we show that Glorot initialization is in fact unstable: small positive deviations in the spectral radius are amplified through time and cause the hidden state to explode. Our theoretical analysis demonstrates that sequences of length t = O( n), where n is the hidden width, are sufficient to induce instability. To address this, we propose a simple, dimension-aware rescaling of Glorot that shifts the spectral radius slightly below one, preventing rapid signal explosion or decay. These results suggest that standard initialization schemes may break down in the long-sequence regime, motivating a separate line of theory for stable recurrent initialization.

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country:

North America > United States (0.28)
Europe (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Demystifying Oversmoothing in Attention-Based Graph Neural Networks

Neural Information Processing SystemsApr-28-2026, 13:11:06 GMT

Oversmoothing in Graph Neural Networks (GNNs) refers to the phenomenon where increasing network depth leads to homogeneous node representations. While previous work has established that Graph Convolutional Networks (GCNs) exponentially lose expressive power, it remains controversial whether the graph attention mechanism can mitigate oversmoothing. In this work, we provide a definitive answer to this question through a rigorous mathematical analysis, by viewing attention-based GNNs as nonlinear time-varying dynamical systems and incorporating tools and techniques from the theory of products of inhomogeneous matrices and the joint spectral radius. We establish that, contrary to popular belief, the graph attention mechanism cannot prevent oversmoothing and loses expressive power exponentially. The proposed framework extends the existing results on oversmoothing for symmetric GCNs to a significantly broader class of GNN models, including random walk GCNs, Graph Attention Networks (GATs) and (graph) transformers.

artificial intelligence, machine learning, matrix, (17 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Stability of Sequential and Parallel Coordinate Ascent Variational Inference

Pati, Debdeep

arXiv.org Machine LearningMar-24-2026

We highlight a striking difference in behavior between two widely used variants of coordinate ascent variational inference: the sequential and parallel algorithms. While such differences were known in the numerical analysis literature in simpler settings, they remain largely unexplored in the optimization-focused literature on variational inference in more complex models. Focusing on the moderately high-dimensional linear regression problem, we show that the sequential algorithm, although typically slower, enjoys convergence guarantees under more relaxed conditions than the parallel variant, which is often employed to facilitate block-wise updates and improve computational efficiency.

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Machine Learning

2603.20929

Country:

North America > United States > Wisconsin > Dane County > Madison (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.35)

Add feedback

Globally Optimal Training of Generalized Polynomial Neural Networks with Nonlinear Spectral Methods

Antoine Gautier, Quynh N. Nguyen, Matthias Hein

Neural Information Processing SystemsMar-23-2026, 03:36:26 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, machine learning, neural network, (12 more...)

Neural Information Processing Systems

Country:

Europe (0.68)
North America > United States (0.28)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.95)

Add feedback

Local Convergence of Gradient Methods for Min-Max Games under Partial Curvature

Guillaume Wang, Lénaïc Chizat

Neural Information Processing SystemsFeb-16-2026, 21:20:33 GMT

artificial intelligence, eigenvalue, machine learning, (18 more...)

Neural Information Processing Systems

Country:

Europe > Switzerland > Vaud > Lausanne (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Denmark (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

6e4cdfdd909ea4e34bfc85a12774cba0-Paper-Conference.pdf

Neural Information Processing SystemsFeb-13-2026, 21:39:55 GMT

artificial intelligence, machine learning, matrix, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Wisconsin (0.04)
North America > United States > Texas (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

74fc5575632191d96881d8015f79dde3-Paper-Conference.pdf

Neural Information Processing SystemsFeb-9-2026, 20:29:23 GMT

constraint, dag constraint, gradient, (14 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)

Genre: Research Report (0.46)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Data Science (0.93)

Add feedback

42ae1544956fbe6e09242e6cd752444c-Supplemental.pdf

Neural Information Processing SystemsFeb-8-2026, 05:26:27 GMT

activation, mis, mis 0, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.40)

Add feedback

When Cyclic Coordinate Descent Outperforms Randomized Coordinate Descent Mert Gürbüzbalaban, Asuman Ozdaglar, Pablo A. Parrilo

Neural Information Processing SystemsNov-21-2025, 04:32:09 GMT

The coordinate descent (CD) method is a classical optimization algorithm that has seen a revival of interest because of its competitive performance in machine learning applications.

artificial intelligence, machine learning, optimization problem, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Filters

Collaborating Authors

spectral radius

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Compute Efficiency and Serial Runtime Tradeoffs for Stochastic Momentum Methods

Revisiting Glorot Initialization for Long-Range Linear Recurrences

Demystifying Oversmoothing in Attention-Based Graph Neural Networks

Stability of Sequential and Parallel Coordinate Ascent Variational Inference

Globally Optimal Training of Generalized Polynomial Neural Networks with Nonlinear Spectral Methods

Local Convergence of Gradient Methods for Min-Max Games under Partial Curvature

6e4cdfdd909ea4e34bfc85a12774cba0-Paper-Conference.pdf

74fc5575632191d96881d8015f79dde3-Paper-Conference.pdf

42ae1544956fbe6e09242e6cd752444c-Supplemental.pdf

When Cyclic Coordinate Descent Outperforms Randomized Coordinate Descent Mert Gürbüzbalaban, Asuman Ozdaglar, Pablo A. Parrilo