AITopics | minima

Neural Information Processing Systems http://nips.cc/

artificial intelligence, asymmetric valley, machine learning, (18 more...)

Neural Information Processing Systems

Country: Asia (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)

Add feedback

Preconditioning Matters: Fast Global Convergence of Non-convex Matrix Factorization via Scaled Gradient Descent

Neural Information Processing SystemsApr-30-2026, 06:24:27 GMT

Low-rank matrix factorization (LRMF) is a canonical problem in non-convex optimization, the objective function to be minimized is non-convex and even non-smooth, which makes the global convergence guarantee of gradient-based algorithm quite challenging.

artificial intelligence, initialization, machine learning, (16 more...)

Neural Information Processing Systems

Country: Asia (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.45)

Add feedback

357cfba15668cc2e1e73111e09d54383-Paper.pdf

Neural Information Processing SystemsApr-25-2026, 11:01:41 GMT

artificial intelligence, learning, machine learning, (14 more...)

Neural Information Processing Systems

Country: North America > United States (0.68)

Genre:

Research Report > Promising Solution (0.46)
Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

The alignment property of SGD noise and how it helps select flat minima: A stability analysis

Neural Information Processing SystemsApr-25-2026, 00:09:25 GMT

The phenomenon that stochastic gradient descent (SGD) favors flat minima has played a critical role in understanding the implicit regularization of SGD. In this paper, we provide an explanation of this striking phenomenon by relating the particular noise structure of SGD to its linear stability (Wu et al., 2018). Specifically, we consider training over-parameterized models with square loss. We prove that if a global minimum θ is linearly stable for SGD, then it must satisfy H(θ) F O( B/η), where H(θ) F,B,η denote the Frobenius norm of Hessian at θ, batch size, and learning rate, respectively. Otherwise, SGD will escape from that minimum exponentially fast. Hence, for minima accessible to SGD, the sharpness--as measured by the Frobenius norm of the Hessian--is bounded independently of the model size and sample size. The key to obtaining these results is exploiting the particular structure of SGD noise: The noise concentrates in sharp directions of local landscape and the magnitude is proportional to loss value. This alignment property of SGD noise provably holds for linear networks and random feature models (RFMs), and is empirically verified for nonlinear networks. Moreover, the validity and practical relevance of our theoretical findings are also justified by extensive experiments on CIFAR-10 dataset.

artificial intelligence, machine learning, sgd, (16 more...)

Neural Information Processing Systems

Country: North America > United States (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.34)

Add feedback

01ded4259d101feb739b06c399e9cd9c-Paper.pdf

Neural Information Processing SystemsApr-24-2026, 10:11:38 GMT

artificial intelligence, machine learning, representation, (16 more...)

Neural Information Processing Systems

Country: North America > United States > California (0.46)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

b139e104214a08ae3f2ebcce149cdf6e-Paper.pdf

Neural Information Processing SystemsApr-21-2026, 20:09:21 GMT

artificial intelligence, local minima, machine learning, (16 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.72)

Add feedback

Sharp description of local minima in the loss landscape of high-dimensional two-layer ReLU neural networks

Huang, Jie, Loureiro, Bruno, Mannelli, Stefano Sarao

arXiv.org Machine LearningApr-13-2026

We study the population loss landscape of two-layer ReLU networks of the form $\sum_{k=1}^K \mathrm{ReLU}(w_k^\top x)$ in a realisable teacher-student setting with Gaussian covariates. We show that local minima admit an exact low-dimensional representation in terms of summary statistics, yielding a sharp and interpretable characterisation of the landscape. We further establish a direct link with one-pass SGD: local minima correspond to attractive fixed points of the dynamics in summary statistics space. This perspective reveals a hierarchical structure of minima: they are typically isolated in the well-specified regime, but become connected by flat directions as network width increases. In this overparameterised regime, global minima become increasingly accessible, attracting the dynamics and reducing convergence to spurious solutions. Overall, our results reveal intrinsic limitations of common simplifying assumptions, which may miss essential features of the loss landscape even in minimal neural network models.

artificial intelligence, co 1, machine learning, (18 more...)

arXiv.org Machine Learning

2604.09412

Country:

North America > Canada > Alberta > Census Division No. 15 > Improvement District No. 9 > Banff (0.04)
Europe > France (0.04)
Asia > Singapore (0.04)
Africa > Middle East > Tunisia > Ben Arous Governorate > Ben Arous (0.04)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Where Do Large Learning Rates Lead Us?

Neural Information Processing SystemsMar-21-2026, 00:26:38 GMT

It is generally accepted that starting neural networks training with large learning rates (LRs) improves generalization. Following a line of research devoted to understanding this effect, we conduct an empirical study in a controlled setting focusing on two questions: 1) how large an initial LR is required for obtaining optimal quality, and 2) what are the key differences between models trained with different LRs? We discover that only a narrow range of initial LRs slightly above the convergence threshold lead to optimal results after fine-tuning with a small LR or weight averaging. By studying the local geometry of reached minima, we observe that using LRs from this optimal range allows for the optimization to locate a basin that only contains high-quality minima. Additionally, we show that these initial LRs result in a sparse set of learned features, with a clear focus on those most relevant for the task. In contrast, starting training with too small LRs leads to unstable minima and attempts to learn all features simultaneously, resulting in poor generalization. Conversely, using initial LRs that are too large fails to detect a basin with good solutions and extract meaningful patterns from the data.

artificial intelligence, machine learning, proceedings, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.60)

Add feedback

Make Continual Learning Stronger via C-Flat

Neural Information Processing SystemsMar-18-2026, 02:45:44 GMT

How to balance the learning'sensitivity-stability' upon new task training and memory preserving is critical in CL to resolve catastrophic forgetting. Improving model generalization ability within each learning phase is one solution to help CL learning overcome the gap in the joint knowledge space. Zeroth-order loss landscape sharpness-aware minimization is a strong training regime improving model generalization in transfer learning compared with optimizer like SGD. It has also been introduced into CL to improve memory representation or learning efficiency. However, zeroth-order sharpness alone could favors sharper over flatter minima in certain scenarios, leading to a rather sensitive minima rather than a global optima. To further enhance learning stability, we propose a Continual Flatness (C-Flat) method featuring a flatter loss landscape tailored for CL. C-Flat could be easily called with only one line of code and is plug-and-play to any CL methods. A general framework of C-Flat applied to all CL categories and a thorough comparison with loss minima optimizer and flat minima based CL approaches is presented in this paper, showing that our method can boost CL performance in almost all cases. Code is available at https://github.com/WanNaa/C-Flat.

artificial intelligence, machine learning, proceedings, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.76)

Add feedback

A theory on the absence of spurious solutions for nonconvex and nonsmooth optimization

Neural Information Processing SystemsMar-17-2026, 00:03:03 GMT

We study the set of continuous functions that admit no spurious local optima (i.e.

artificial intelligence, optimization problem, proceedings, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.45)

Add feedback

Filters

Collaborating Authors

minima

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Asymmetric Valleys: Beyond Sharp and Flat Local Minima

Preconditioning Matters: Fast Global Convergence of Non-convex Matrix Factorization via Scaled Gradient Descent

357cfba15668cc2e1e73111e09d54383-Paper.pdf

The alignment property of SGD noise and how it helps select flat minima: A stability analysis

01ded4259d101feb739b06c399e9cd9c-Paper.pdf

b139e104214a08ae3f2ebcce149cdf6e-Paper.pdf

Sharp description of local minima in the loss landscape of high-dimensional two-layer ReLU neural networks

Where Do Large Learning Rates Lead Us?

Make Continual Learning Stronger via C-Flat

A theory on the absence of spurious solutions for nonconvex and nonsmooth optimization