AITopics | elr

high probability, tanh, wrong label, (16 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (0.42)

Add feedback

5aea56eefab60e06f35016478e21aae6-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-9-2026, 05:45:17 GMT

A.2 DerivationsforSection3.1 We begin with a formal derivation of the formulas in Section 3.1. We remind that we consider a function F(θ) whose parameters can be split inton SI groups: θ = (θ1,...,θn). We solve an optimization problem(1)with projected gradient descent(2). Remark2 The above formulation allegedly lacks the third (divergent) regime. If, conversely, η > 1Pn i=1αi, then at each iteration at least one of the individual ELRs exceeds its convergencethreshold: ηi > 1αi.

artificial intelligence, machine learning, regime, (16 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.34)

Add feedback

5aea56eefab60e06f35016478e21aae6-Paper-Conference.pdf

Neural Information Processing SystemsFeb-9-2026, 05:45:13 GMT

elr, neural network, regime, (12 more...)

Neural Information Processing Systems

Country:

Asia > Russia (0.14)
Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
Africa > Ethiopia > Addis Ababa > Addis Ababa (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

1b99db17b54735d22dbed15c24f2dbdc-Paper-Conference.pdf

Neural Information Processing SystemsFeb-8-2026, 21:15:02 GMT

Weshowthatour temporal ensemble is asymptotically correct and our label smoothing technique can reduce the optimality gap of self-training.

artificial intelligence, machine learning, prediction, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > Illinois > Cook County > Evanston (0.04)
Europe > United Kingdom (0.04)

Genre: Research Report (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Training Scale-Invariant Neural Networks on the Sphere Can Happen in Three Regimes

Neural Information Processing SystemsDec-24-2025, 06:34:23 GMT

A fundamental property of deep learning normalization techniques, such as batch normalization, is making the pre-normalization parameters scale invariant. The intrinsic domain of such parameters is the unit sphere, and therefore their gradient optimization dynamics can be represented via spherical optimization with varying effective learning rate (ELR), which was studied previously. However, the varying ELR may obscure certain characteristics of the intrinsic loss landscape structure. In this work, we investigate the properties of training scale-invariant neural networks directly on the sphere using a fixed ELR. We discover three regimes of such training depending on the ELR value: convergence, chaotic equilibrium, and divergence. We study these regimes in detail both on a theoretical examination of a toy example and on a thorough empirical analysis of real scale-invariant deep learning models. Each regime has unique features and reflects specific properties of the intrinsic loss landscape, some of which have strong parallels with previous research on both regular and scale-invariant neural networks training. Finally, we demonstrate how the discovered regimes are reflected in conventional training of normalized networks and how they can be leveraged to achieve better optima.

name change, regime, training scale-invariant neural network, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.83)

Add feedback

Efficient Hyperparameter Tuning via Trajectory Invariance Principle

Li, Bingrui, Wen, Jiaxin, Zhou, Zhanpeng, Zhu, Jun, Chen, Jianfei

arXiv.org Artificial IntelligenceSep-30-2025

As hyperparameter tuning becomes increasingly costly at scale, efficient tuning methods are essential. Yet principles for guiding hyperparameter tuning remain limited. In this work, we seek to establish such principles by considering a broad range of hyperparameters, including batch size, learning rate, and weight decay. We identify a phenomenon we call trajectory invariance, where pre-training loss curves, gradient noise, and gradient norm exhibit invariance--closely overlapping--with respect to a quantity that combines learning rate and weight decay. This phenomenon effectively reduces the original two-dimensional hyperparameter space to one dimension, yielding an efficient tuning rule: follow the salient direction revealed by trajectory invariance. Furthermore, we refine previous scaling laws and challenge several existing viewpoints. Overall, our work proposes new principles for efficient tuning and inspires future research on scaling laws.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2509.25049

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.47)

Add feedback

Title

Author

Neural Information Processing SystemsAug-17-2025, 02:59:14 GMT

high probability, tanh, wrong label, (16 more...)

Neural Information Processing Systems

Industry: Education > Educational Setting (0.30)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)

Add feedback

A Theory

Neural Information Processing SystemsAug-15-2025, 01:57:51 GMT

In this section, we provide proofs and additional details for Section 3. A.1 Norm constraint: total vs. individual We begin with a formal derivation of the formulas in Section 3.1. Then the following results hold: 1. η < The above formulation allegedly lacks the third (divergent) regime. For the second statement, based on eq. A.4 More formally on the results of Section 3.2 In this section, we provide a more formal argument on the results of Section 3.2. According to the results of Section 3.1, solving it with the projected gradient method Here we provide additional plots depicting the behavior of individual ELRs in the toy example at the end of Section 3.2.

elr, experiment, regime, (16 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Training Scale-Invariant Neural Networks on the Sphere Can Happen in Three Regimes Maxim Kodryan

Neural Information Processing SystemsAug-15-2025, 01:57:46 GMT

ELR may obscure certain characteristics of the intrinsic loss landscape structure.

elr, neural network, regime, (12 more...)

Neural Information Processing Systems

Country:

Asia > Russia (0.14)
Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
Africa > Ethiopia > Addis Ababa > Addis Ababa (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Training Scale-Invariant Neural Networks on the Sphere Can Happen in Three Regimes

Neural Information Processing SystemsMay-27-2025, 05:37:04 GMT

A fundamental property of deep learning normalization techniques, such as batch normalization, is making the pre-normalization parameters scale invariant. The intrinsic domain of such parameters is the unit sphere, and therefore their gradient optimization dynamics can be represented via spherical optimization with varying effective learning rate (ELR), which was studied previously. However, the varying ELR may obscure certain characteristics of the intrinsic loss landscape structure. In this work, we investigate the properties of training scale-invariant neural networks directly on the sphere using a fixed ELR. We discover three regimes of such training depending on the ELR value: convergence, chaotic equilibrium, and divergence.

artificial intelligence, deep learning, training scale-invariant neural network, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Filters

Collaborating Authors

elr

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Title

5aea56eefab60e06f35016478e21aae6-Supplemental-Conference.pdf

5aea56eefab60e06f35016478e21aae6-Paper-Conference.pdf

1b99db17b54735d22dbed15c24f2dbdc-Paper-Conference.pdf

Training Scale-Invariant Neural Networks on the Sphere Can Happen in Three Regimes

Efficient Hyperparameter Tuning via Trajectory Invariance Principle

Title

A Theory

Training Scale-Invariant Neural Networks on the Sphere Can Happen in Three Regimes Maxim Kodryan

Training Scale-Invariant Neural Networks on the Sphere Can Happen in Three Regimes