AITopics | gradient magnitude

Collaborating Authors

gradient magnitude

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

f0c68d99827dc09ed28aa073455efcbe-Paper-Conference.pdf

Neural Information Processing SystemsFeb-18-2026, 15:56:09 GMT

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country:

Europe > Italy > Tuscany > Florence (0.04)
Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

A PID Controller Approach for Adaptive Probability-dependent Gradient Decay in Model Calibration Siyuan Zhang School of Internet of Things Engineering Jiangnan University Wuxi, China 214122 Linbo Xie

Neural Information Processing SystemsFeb-18-2026, 05:42:01 GMT

The code of implementation is available in https://github.com/UHIF/PID_AGD.

calibration, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country:

Asia > China (0.40)
Europe > France (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry: Information Technology > Smart Houses & Appliances (0.40)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

4947292b9f5e7d4ab792fa35537f8b96-Paper-Conference.pdf

Neural Information Processing SystemsFeb-11-2026, 09:15:25 GMT

fsfom, matrix, ogm-g, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > South Korea > Seoul > Seoul (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(3 more...)

Genre: Research Report (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

d9812f756d0df06c7381945d2e2c7d4b-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-10-2026, 16:18:08 GMT

deep metric learning, gradient magnitude, norm variance, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.98)

Add feedback

a57ecd54d4df7d999bd9c5e3b973ec75-Paper.pdf

Neural Information Processing SystemsFeb-10-2026, 11:32:52 GMT

gradient, optimizer, update function, (13 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.04)
Europe > Russia (0.04)
Asia > Russia (0.04)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

9d58963592071dbf38a0fa114269959c-Supplemental.pdf

Neural Information Processing SystemsFeb-9-2026, 13:17:19 GMT

batch size, experiment, scalecom, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Adam on Local Time: Addressing Nonstationarity in RL with Relative Adam Timesteps

Neural Information Processing SystemsDec-27-2025, 13:11:47 GMT

In reinforcement learning (RL), it is common to apply techniques used broadly in machine learning such as neural network function approximators and momentum-based optimizers. However, such tools were largely developed for supervised learning rather than nonstationary RL, leading practitioners to adopt target networks, clipped policy updates, and other RL-specific implementation tricks to combat this mismatch, rather than directly adapting this toolchain for use in RL. In this paper, we take a different approach and instead address the effect of nonstationarity by adapting the widely used Adam optimiser. We first analyse the impact of nonstationary gradient magnitude --- such as that caused by a change in target network --- on Adam's update size, demonstrating that such a change can lead to large updates and hence sub-optimal performance.To address this, we introduce Adam-Rel.Rather than using the global timestep in the Adam update, Adam-Rel uses the timestep within an epoch, essentially resetting Adam's timestep to 0 after target changes.We demonstrate that this avoids large updates and reduces to learning rate annealing in the absence of such increases in gradient magnitude. Evaluating Adam-Rel in both on-policy and off-policy RL, we demonstrate improved performance in both Atari and Craftax.We then show that increases in gradient norm occur in RL in practice, and examine the differences between our theoretical model and the observed data.

artificial intelligence, machine learning, proceedings, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Time-Reversed Dissipation Induces Duality Between Minimizing Gradient Norm and Function Value

Neural Information Processing SystemsDec-25-2025, 00:47:05 GMT

In convex optimization, first-order optimization methods efficiently minimizing function values have been a central subject study since Nesterov's seminal work of 1983. Recently, however, Kim and Fessler's OGM-G and Lee et al.'s FISTA-G have been presented as alternatives that efficiently minimize the gradient magnitude instead. In this paper, we present H-duality, which represents a surprising one-to-one correspondence between methods efficiently minimizing function values and methods efficiently minimizing gradient magnitude.

method efficiently, minimizing gradient norm, time-reversed dissipation induce duality, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (1.00)

Add feedback

Adaptive Surrogate Gradients for Sequential Reinforcement Learning in Spiking Neural Networks

Berghe, Korneel Van den, Stroobants, Stein, Reddi, Vijay Janapa, de Croon, G. C. H. E.

arXiv.org Artificial IntelligenceOct-29-2025

Neuromorphic computing systems are set to revolutionize energy-constrained robotics by achieving orders-of-magnitude efficiency gains, while enabling native temporal processing. Spiking Neural Networks (SNNs) represent a promising algorithmic approach for these systems, yet their application to complex control tasks faces two critical challenges: (1) the non-differentiable nature of spiking neurons necessitates surrogate gradients with unclear optimization properties, and (2) the stateful dynamics of SNNs require training on sequences, which in reinforcement learning (RL) is hindered by limited sequence lengths during early training, preventing the network from bridging its warm-up period. We address these challenges by systematically analyzing surrogate gradient slope settings, showing that shallower slopes increase gradient magnitude in deeper layers but reduce alignment with true gradients. In supervised learning, we find no clear preference for fixed or scheduled slopes. The effect is much more pronounced in RL settings, where shallower slopes or scheduled slopes lead to a 2.1x improvement in both training and final deployed performance. Next, we propose a novel training approach that leverages a privileged guiding policy to bootstrap the learning process, while still exploiting online environment interactions with the spiking policy. Combining our method with an adaptive slope schedule for a real-world drone position control task, we achieve an average return of 400 points, substantially outperforming prior techniques, including Behavioral Cloning and TD3BC, which achieve at most --200 points under the same conditions. This work advances both the theoretical understanding of surrogate gradient learning in SNNs and practical training methodologies for neuromorphic controllers demonstrated in real-world robotic systems.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

arXiv.org Artificial Intelligence

2510.24461

Country:

Europe > Netherlands > South Holland > Delft (0.04)
North America > United States (0.04)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (0.46)

Add feedback

Imbalanced Gradients in RL Post-Training of Multi-Task LLMs

Wu, Runzhe, Samanta, Ankur, Jain, Ayush, Fujimoto, Scott, Kwon, Jeongyeol, Kretzu, Ben, Yu, Youliang, Hassani, Kaveh, Vidolov, Boris, Efroni, Yonathan

arXiv.org Artificial IntelligenceOct-28-2025

Multi-task post-training of large language models (LLMs) is typically performed by mixing datasets from different tasks and optimizing them jointly. This approach implicitly assumes that all tasks contribute gradients of similar magnitudes; when this assumption fails, optimization becomes biased toward large-gradient tasks. In this paper, however, we show that this assumption fails in RL post-training: certain tasks produce significantly larger gradients, thus biasing updates toward those tasks. Such gradient imbalance would be justified only if larger gradients implied larger learning gains on the tasks (i.e., larger performance improvements) -- but we find this is not true. Large-gradient tasks can achieve similar or even much lower learning gains than small-gradient ones. Further analyses reveal that these gradient imbalances cannot be explained by typical training statistics such as training rewards or advantages, suggesting that they arise from the inherent differences between tasks. This cautions against naive dataset mixing and calls for future work on principled gradient-level corrections for LLMs.

artificial intelligence, large language model, natural language, (15 more...)

arXiv.org Artificial Intelligence

2510.19178

Country:

South America > Colombia > Meta Department > Villavicencio (0.04)
North America > United States > Florida > Miami-Dade County > Miami (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
(2 more...)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback