AITopics | gradient signal

Collaborating Authors

gradient signal

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Federated Learning over Connected Modes

Neural Information Processing SystemsMar-21-2026, 18:43:04 GMT

Statistical heterogeneity in federated learning poses two major challenges: slow global training due to conflicting gradient signals, and the need of personalization for local distributions. In this work, we tackle both challenges by leveraging recent advances in \emph{linear mode connectivity} --- identifying a linearly connected low-loss region in the parameter space of neural networks, which we call solution simplex. We propose federated learning over connected modes (\textsc{Floco}), where clients are assigned local subregions in this simplex based on their gradient signals, and together learn the shared global solution simplex. This allows personalization of the client models to fit their local distributions within the degrees of freedom in the solution simplex and homogenizes the update signals for the global simplex training. Our experiments show that \textsc{Floco} accelerates the global training process, and significantly improves the local accuracy with minimal computational overhead in cross-silo federated learning settings.

artificial intelligence, machine learning, proceedings, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Latent Weights Do Not Exist: Rethinking Binarized Neural Network Optimization

Neural Information Processing SystemsOct-3-2025, 07:39:30 GMT

Instead their main role is to provide inertia during training. We interpret current methods in terms of inertia and provide novel insights into the optimization of BNNs.

arxiv preprint arxiv, latent weight, optimizer, (14 more...)

Neural Information Processing Systems

Country:

North America > Canada (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)

Add feedback

Just Pick a Sign: Optimizing Deep Multitask Models with Gradient Sign Dropout

Neural Information Processing SystemsOct-2-2025, 05:22:20 GMT

The vast majority of deep models use multiple gradient signals, typically corresponding to a sum of multiple loss terms, to update a shared set of trainable weights.

artificial intelligence, deep learning, machine learning, (14 more...)

Neural Information Processing Systems

Country: North America (0.15)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Gaussian Primitive Optimized Deformable Retinal Image Registration

Tian, Xin, Wang, Jiazheng, Zhang, Yuxi, Chen, Xiang, Hu, Renjiu, Li, Gaolei, Liu, Min, Zhang, Hang

arXiv.org Artificial IntelligenceAug-26-2025

Deformable retinal image registration is notoriously difficult due to large homogeneous regions and sparse but critical vascular features, which cause limited gradient signals in standard learning-based frameworks. In this paper, we introduce Gaussian Primitive Optimization (GPO), a novel iterative framework that performs structured message passing to overcome these challenges. After an initial coarse alignment, we extract keypoints at salient anatomical structures (e.g., major vessels) to serve as a minimal set of descriptor-based control nodes (DCN). Each node is modelled as a Gaussian primitive with trainable position, displacement, and radius, thus adapting its spatial influence to local deformation scales. A K-Nearest Neighbors (KNN) Gaussian interpolation then blends and propagates displacement signals from these information-rich nodes to construct a globally coherent displacement field; focusing interpolation on the top (K) neighbors reduces computational overhead while preserving local detail. By strategically anchoring nodes in high-gradient regions, GPO ensures robust gradient flow, mitigating vanishing gradient signal in textureless areas. The framework is optimized end-to-end via a multi-term loss that enforces both keypoint consistency and intensity alignment. Experiments on the FIRE dataset show that GPO reduces the target registration error from 6.2\,px to ~2.4\,px and increases the AUC at 25\,px from 0.770 to 0.938, substantially outperforming existing methods. The source code can be accessed via https://github.com/xintian-99/GPOreg.

artificial intelligence, machine learning, pattern recognition, (19 more...)

arXiv.org Artificial Intelligence

2508.16852

Country:

Asia > China (0.28)
Europe (0.28)

Genre: Research Report (0.40)

Industry:

Health & Medicine > Therapeutic Area > Ophthalmology/Optometry (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Image Matching (0.66)
(2 more...)

Add feedback

Federated Learning over Connected Modes

Neural Information Processing SystemsMay-27-2025, 10:17:59 GMT

connected mode, federated learning, solution simplex, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Review for NeurIPS paper: Black-Box Optimization with Local Generative Surrogates

Neural Information Processing SystemsJan-27-2025, 09:31:50 GMT

Additional Feedback: Introduction: The focus of this work seems to be on cases where the inputs are stochastic and the simulator are stochastic. This would be equally applicable to scenarios which were deterministic but otherwise non differentiable right? One related work that comes to mind is related to sobolev training. It's a bit different in motivation and setup but might be nice to cite. The introduction is generally well motivated and concise.

algorithm, generative model, gradient, (11 more...)

Neural Information Processing Systems

Industry: Transportation > Air (0.41)

Technology: Information Technology > Artificial Intelligence (0.84)

Add feedback

Accelerating Large Batch Training via Gradient Signal to Noise Ratio (GSNR)

Jiang, Guo-qing, Liu, Jinlong, Ding, Zixiang, Guo, Lin, Lin, Wei

arXiv.org Artificial IntelligenceSep-24-2023

As models for nature language processing (NLP), computer vision (CV) and recommendation systems (RS) require surging computation, a large number of GPUs/TPUs are paralleled as a large batch (LB) to improve training throughput. However, training such LB tasks often meets large generalization gap and downgrades final precision, which limits enlarging the batch size. In this work, we develop the variance reduced gradient descent technique (VRGD) based on the gradient signal to noise ratio (GSNR) and apply it onto popular optimizers such as SGD/Adam/LARS/LAMB. We carry out a theoretical analysis of convergence rate to explain its fast training dynamics, and a generalization analysis to demonstrate its smaller generalization gap on LB training. Comprehensive experiments demonstrate that VRGD can accelerate training ($1\sim 2 \times$), narrow generalization gap and improve final accuracy. We push the batch size limit of BERT pretraining up to 128k/64k and DLRM to 512k without noticeable accuracy loss. We improve ImageNet Top-1 accuracy at 96k by $0.52pp$ than LARS. The generalization gap of BERT and ImageNet training is significantly reduce by over $65\%$.

batch training, gradient signal, noise ratio, (1 more...)

arXiv.org Artificial Intelligence

2309.13681

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence (1.00)

Add feedback

Angular upsampling in diffusion MRI using contextual HemiHex sub-sampling in q-space

Faiyaz, Abrar, Uddin, Md Nasir, Schifitto, Giovanni

arXiv.org Artificial IntelligenceOct-31-2022

Artificial Intelligence (Deep Learning(DL)/ Machine Learning(ML)) techniques are widely being used to address and overcome all kinds of ill-posed problems in medical imaging which was or in fact is seemingly impossible. Reducing gradient directions but harnessing high angular resolution(HAR) diffusion data in MR that retains clinical features is an important and challenging problem in the field. While the DL/ML approaches are promising, it is important to incorporate relevant context for the data to ensure that maximum prior information is provided for the AI model to infer the posterior. In this paper, we introduce HemiHex (HH) subsampling to suggestively address training data sampling on q-space geometry, followed by a nearest neighbor regression training on the HH-samples to finally upsample the dMRI data. Earlier studies has tried to use regression for up-sampling dMRI data but yields performance issues as it fails to provide structured geometrical measures for inference. Our proposed approach is a geometrically optimized regression technique which infers the unknown q-space thus addressing the limitations in the earlier studies.

artificial intelligence, deep learning, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2211.0024

Genre: Research Report (1.00)

Industry:

Health & Medicine > Diagnostic Medicine > Imaging (0.51)
Health & Medicine > Health Care Technology (0.50)
Health & Medicine > Therapeutic Area > Neurology (0.31)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Add feedback

Understanding Decoupled and Early Weight Decay

Bjorck, Johan, Weinberger, Kilian, Gomes, Carla

arXiv.org Machine LearningDec-26-2020

Weight decay (WD) is a traditional regularization technique in deep learning, but despite its ubiquity, its behavior is still an area of active research. Golatkar et al. have recently shown that WD only matters at the start of the training in computer vision, upending traditional wisdom. Loshchilov et al. show that for adaptive optimizers, manually decaying weights can outperform adding an $l_2$ penalty to the loss. This technique has become increasingly popular and is referred to as decoupled WD. The goal of this paper is to investigate these two recent empirical observations. We demonstrate that by applying WD only at the start, the network norm stays small throughout training. This has a regularizing effect as the effective gradient updates become larger. However, traditional generalizations metrics fail to capture this effect of WD, and we show how a simple scale-invariant metric can. We also show how the growth of network weights is heavily influenced by the dataset and its generalization properties. For decoupled WD, we perform experiments in NLP and RL where adaptive optimizers are the norm. We demonstrate that the primary issue that decoupled WD alleviates is the mixing of gradients from the objective function and the $l_2$ penalty in the buffers of Adam (which stores the estimates of the first-order moment). Adaptivity itself is not problematic and decoupled WD ensures that the gradients from the $l_2$ term cannot "drown out" the true objective, facilitating easier hyperparameter tuning.

gradient, ikmixnmaz hajnbifojsawsm8wyu8ou oi pufmxbc85i5gj wpn8atvmk, latexit latexit sha1, (11 more...)

arXiv.org Machine Learning

2012.13841

Country:

North America > United States (0.14)
Europe (0.04)
Asia > Vietnam > Hanoi > Hanoi (0.04)

Genre: Research Report (0.82)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Latent Weights Do Not Exist: Rethinking Binarized Neural Network Optimization

Helwegen, Koen, Widdicombe, James, Geiger, Lukas, Liu, Zechun, Cheng, Kwang-Ting, Nusselder, Roeland

arXiv.org Machine LearningJun-5-2019

Optimization of Binarized Neural Networks (BNNs) currently relies on real-valued latent weights to accumulate small update steps. In this paper, we argue that these latent weights cannot be treated analogously to weights in real-valued networks. Instead their main role is to provide inertia during training. We interpret current methods in terms of inertia and provide novel insights into the optimization of BNNs. We subsequently introduce the first optimizer specifically designed for BNNs, Binary Optimizer (Bop), and demonstrate its performance on CIFAR-10 and ImageNet. Together, the redefinition of latent weights as inertia and the introduction of Bop enable a better understanding of BNN optimization and open up the way for further improvements in training methodologies for BNNs.

artificial intelligence, latent weight, machine learning, (17 more...)

arXiv.org Machine Learning

1906.02107

Country: North America > Canada (0.28)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback