AITopics | msgd

Collaborating Authors

msgd

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Revisit last-iterate convergence of mSGD under milder requirement on step size

Neural Information Processing SystemsFeb-12-2026, 17:02:50 GMT

Understanding convergence of stochastic gradient descent (SGD) based optimization algorithms can help deal with enormous machine learning problems.

artificial intelligence, convergence, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.04)
Asia > China > Liaoning Province > Shenyang (0.04)
North America > United States > Indiana > St. Joseph County > Notre Dame (0.04)
(5 more...)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

eceb7374fb94b4efd0fe4bea550d4285-Paper-Conference.pdf

Neural Information Processing SystemsFeb-12-2026, 17:02:47 GMT

convergence, msgd, step size, (14 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.04)
Asia > China > Liaoning Province > Shenyang (0.04)
North America > United States > Indiana > St. Joseph County > Notre Dame (0.04)
(5 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.96)
Information Technology > Artificial Intelligence > Vision (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Revisit last-iterate convergence of mSGD under milder requirement on step size

Neural Information Processing SystemsDec-25-2025, 15:52:54 GMT

Understanding convergence of SGD-based optimization algorithms can help deal with enormous machine learning problems.

milder requirement, name change, revisit last-iterate convergence, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.85)

Add feedback

eceb7374fb94b4efd0fe4bea550d4285-Supplemental-Conference.pdf

Neural Information Processing SystemsAug-19-2025, 17:09:09 GMT

artificial intelligence, convergence, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.04)
Asia > China > Liaoning Province > Shenyang (0.04)
North America > United States > Indiana > St. Joseph County > Notre Dame (0.04)
(5 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Vision (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

eceb7374fb94b4efd0fe4bea550d4285-Paper-Conference.pdf

Neural Information Processing SystemsAug-19-2025, 17:09:05 GMT

artificial intelligence, deep learning, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.04)
Asia > China > Liaoning Province > Shenyang (0.04)
North America > United States > Indiana > St. Joseph County > Notre Dame (0.04)
(5 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.96)
Information Technology > Artificial Intelligence > Vision (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Revisit last-iterate convergence of mSGD under milder requirement on step size

Neural Information Processing SystemsJan-19-2025, 06:01:37 GMT

Understanding convergence of SGD-based optimization algorithms can help deal with enormous machine learning problems. To ensure last-iterate convergence of SGD and momentum-based SGD (mSGD), the existing studies usually constrain the step size \epsilon_{n} to decay as \sum_{n 1} { \infty}\epsilon_{n} {2} \infty, which however is rather conservative and may lead to slow convergence in the early stage of the iteration. In this paper, we relax this requirement by studying an alternate step size for the mSGD. This implies that a larger step size, such as \epsilon_{n} \frac{1}{\sqrt{n}} can be utilized for accelerating the mSGD in the early stage. Under this new step size and some common conditions, we prove that the gradient norm of mSGD for non-convex loss functions asymptotically decays to zero.

convergence, requirement, step size, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.79)

Add feedback

Learning from Streaming Data when Users Choose

Su, Jinyan, Dean, Sarah

arXiv.org Artificial IntelligenceJun-3-2024

Moreover, due to the data-driven nature of digital platforms, interesting dynamics emerge among users and service In digital markets comprised of many competing providers: on the one hand, users choose amongst services, each user chooses between multiple providers based on the quality of their services; on the other service providers according to their preferences, hand, providers use the user data to improve and update and the chosen service makes use of the user data their services, affecting future user choices (Ginart et al., to incrementally improve its model. The service 2021; Kwon et al., 2022; Dean et al., 2024; Jagadeesan et al., providers' models influence which service the 2023a). For example, in personalized music streaming platform, user will choose at the next time step, and the a user chooses amongst different music streaming user's choice, in return, influences the model update, platforms based on how well they meet the user's needs.

algorithm, learning, streaming data, (14 more...)

arXiv.org Artificial Intelligence

2406.01481

Country:

Europe > Austria > Vienna (0.14)
Asia > Middle East > Jordan (0.04)
North America > United States > California (0.04)
(2 more...)

Genre: Research Report (0.64)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Communications (0.83)
Information Technology > Data Science > Data Mining (0.67)

Add feedback

Revisiting Outer Optimization in Adversarial Training

Dabouei, Ali, Taherkhani, Fariborz, Soleymani, Sobhan, Nasrabadi, Nasser M.

arXiv.org Artificial IntelligenceSep-2-2022

Despite the fundamental distinction between adversarial and natural training (AT and NT), AT methods generally adopt momentum SGD (MSGD) for the outer optimization. This paper aims to analyze this choice by investigating the overlooked role of outer optimization in AT. Our exploratory evaluations reveal that AT induces higher gradient norm and variance compared to NT. This phenomenon hinders the outer optimization in AT since the convergence rate of MSGD is highly dependent on the variance of the gradients. To this end, we propose an optimization method called ENGM which regularizes the contribution of each input example to the average mini-batch gradients. We prove that the convergence rate of ENGM is independent of the variance of the gradients, and thus, it is suitable for AT. We introduce a trick to reduce the computational cost of ENGM using empirical observations on the correlation between the norm of gradients w.r.t. the network parameters and input examples. Our extensive evaluations and ablation studies on CIFAR-10, CIFAR-100, and TinyImageNet demonstrate that ENGM and its variants consistently improve the performance of a wide range of AT methods.

arxiv preprint arxiv, engm, variance, (11 more...)

arXiv.org Artificial Intelligence

2209.01199

Country:

North America > United States > West Virginia (0.04)
Europe > Russia (0.04)
Asia > Russia (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.50)

Industry: Information Technology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Stochastic Normalized Gradient Descent with Momentum for Large Batch Training

Zhao, Shen-Yi, Xie, Yin-Peng, Li, Wu-Jun

arXiv.org Machine LearningJul-28-2020

Stochastic gradient descent (SGD) and its variants have been the dominating optimization methods in machine learning. Compared with small batch training, SGD with large batch training can better utilize the computational power of current multi-core systems like GPUs and can reduce the number of communication rounds in distributed training. Hence, SGD with large batch training has attracted more and more attention. However, existing empirical results show that large batch training typically leads to a drop of generalization accuracy. As a result, large batch training has also become a challenging topic. In this paper, we propose a novel method, called stochastic normalized gradient descent with momentum (SNGM), for large batch training. We theoretically prove that compared to momentum SGD (MSGD) which is one of the most widely used variants of SGD, SNGM can adopt a larger batch size to converge to the $\epsilon$-stationary point with the same computation complexity (total number of gradient computation). Empirical results on deep learning also show that SNGM can achieve the state-of-the-art accuracy with a large batch size.

artificial intelligence, batch size, machine learning, (15 more...)

arXiv.org Machine Learning

2007.13985

Country:

Europe > Russia (0.04)
Asia > Russia (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Adam Induces Implicit Weight Sparsity in Rectifier Neural Networks

Yaguchi, Atsushi, Suzuki, Taiji, Asano, Wataru, Nitta, Shuhei, Sakata, Yukinobu, Tanizawa, Akiyuki

arXiv.org Machine LearningDec-19-2018

In recent years, deep neural networks (DNNs) have been applied to various machine leaning tasks, including image recognition, speech recognition, and machine translation. However, large DNN models are needed to achieve state-of-the-art performance, exceeding the capabilities of edge devices. Model reduction is thus needed for practical use. In this paper, we point out that deep learning automatically induces group sparsity of weights, in which all weights connected to an output channel (node) are zero, when training DNNs under the following three conditions: (1) rectified-linear-unit (ReLU) activations, (2) an $L_2$-regularized objective function, and (3) the Adam optimizer. Next, we analyze this behavior both theoretically and experimentally, and propose a simple model reduction method: eliminate the zero weights after training the DNN. In experiments on MNIST and CIFAR-10 datasets, we demonstrate the sparsity with various training setups. Finally, we show that our method can efficiently reduce the model size and performs well relative to methods that use a sparsity-inducing regularizer.

artificial intelligence, machine learning, sparsity, (17 more...)

arXiv.org Machine Learning

1812.08119

Country: Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)

Genre: Research Report (0.40)

Industry: Education (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)

Add feedback