AITopics | gsnr

Collaborating Authors

gsnr

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Accelerating Large Batch Training via Gradient Signal to Noise Ratio (GSNR)

Jiang, Guo-qing, Liu, Jinlong, Ding, Zixiang, Guo, Lin, Lin, Wei

arXiv.org Artificial IntelligenceSep-24-2023

As models for nature language processing (NLP), computer vision (CV) and recommendation systems (RS) require surging computation, a large number of GPUs/TPUs are paralleled as a large batch (LB) to improve training throughput. However, training such LB tasks often meets large generalization gap and downgrades final precision, which limits enlarging the batch size. In this work, we develop the variance reduced gradient descent technique (VRGD) based on the gradient signal to noise ratio (GSNR) and apply it onto popular optimizers such as SGD/Adam/LARS/LAMB. We carry out a theoretical analysis of convergence rate to explain its fast training dynamics, and a generalization analysis to demonstrate its smaller generalization gap on LB training. Comprehensive experiments demonstrate that VRGD can accelerate training ($1\sim 2 \times$), narrow generalization gap and improve final accuracy. We push the batch size limit of BERT pretraining up to 128k/64k and DLRM to 512k without noticeable accuracy loss. We improve ImageNet Top-1 accuracy at 96k by $0.52pp$ than LARS. The generalization gap of BERT and ImageNet training is significantly reduce by over $65\%$.

batch training, gradient signal, noise ratio, (1 more...)

arXiv.org Artificial Intelligence

2309.13681

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence (1.00)

Add feedback

Understanding Why Neural Networks Generalize Well Through GSNR of Parameters

Liu, Jinlong, Jiang, Guoqing, Bai, Yunzhi, Chen, Ting, Wang, Huayan

arXiv.org Machine LearningJan-21-2020

GSNR of a parameter is defined as the ratio between its gradient's squared mean and Previous work (Zhang et al., 2016; Hardt et al., 2015; Dziugaite & Roy, 2017) suggests that the The GSNR of a parameter is defined as the ratio between its gradient's squared mean and variance Previous work tried to use GSNR to conduct theoretical analysis on deep learning. For example, Rainforth et al. (2018) used GSNR to analyze variational bounds in Intuitively, GSNR measures the similarity of a parameter's gradients among different training samples. To reveal the mechanism of DNNs' good generalization ability, we show that the gradient descent We believe this is probably the key to DNNs' remarkable generalization ability. In the remainder of this paper we first analyze the relation between GSNR and generalization (Section 2). At a particular point of the parameter space, GSNR measures the consistency of a parameter's gradients across different data samples.

gradient, gsnr, model parameter, (15 more...)

arXiv.org Machine Learning

2001.07384

Country: Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.36)

Add feedback