AITopics | msam

Momentum-SAM: Sharpness Aware Minimization without Computational Overhead

Neural Information Processing SystemsJun-17-2026, 05:42:37 GMT

The recently proposed optimization algorithm for deep neural networks Sharpness Aware Minimization (SAM) suggests perturbing parameters before gradient calculation by a gradient ascent step to guide the optimization into parameter space regions of flat loss. While significant generalization improvements and thus reduction of overfitting could be demonstrated, the computational costs are doubled due to the additionally needed gradient calculation, making SAM unfeasible in case of limited computationally capacities. Motivated by Nesterov Accelerated Gradient (NAG) we propose Momentum-SAM (MSAM), which perturbs parameters in the direction of the accumulated momentum vector to achieve low sharpness without significant computational overhead or memory demands over SGD or Adam. We evaluate MSAM in detail and reveal insights on separable mechanisms of NAG, SAM and MSAM regarding training optimization and generalization.

artificial intelligence, machine learning, msam, (18 more...)

Neural Information Processing Systems

Country: North America > Canada > Ontario (0.28)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Momentum-SAM: Sharpness Aware Minimization without Computational Overhead

Neural Information Processing SystemsJun-12-2026, 03:55:42 GMT

The recently proposed optimization algorithm for deep neural networks Sharpness Aware Minimization (SAM) suggests perturbing parameters before gradient calculation by a gradient ascent step to guide the optimization into parameter space regions of flat loss. While significant generalization improvements and thus reduction of overfitting could be demonstrated, the computational costs are doubled due to the additionally needed gradient calculation, making SAM unfeasible in case of limited computationally capacities. Motivated by Nesterov Accelerated Gradient (NAG) we propose Momentum-SAM (MSAM), which perturbs parameters in the direction of the accumulated momentum vector to achieve low sharpness without significant computational overhead or memory demands over SGD or Adam. We evaluate MSAM in detail and reveal insights on separable mechanisms of NAG, SAM and MSAM regarding training optimization and generalization.

artificial intelligence, machine learning, proceedings, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.62)

Add feedback

459a911eb49cd2e0192055ee156d04e5-Paper-Conference.pdf

Neural Information Processing SystemsFeb-11-2026, 04:53:24 GMT

attention module, generalization, statistics, (16 more...)

Neural Information Processing Systems

Country: Asia > China > Beijing > Beijing (0.04)

Genre: Instructional Material (0.67)

Industry: Education (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Evolving Standardization for Continual Domain Generalization over Temporal Drift

Neural Information Processing SystemsDec-24-2025, 22:48:02 GMT

The capability of generalizing to out-of-distribution data is crucial for the deployment of machine learning models in the real world. Existing domain generalization (DG) mainly embarks on offline and discrete scenarios, where multiple source domains are simultaneously accessible and the distribution shift among domains is abrupt and violent. Nevertheless, such setting may not be universally applicable to all real-world applications, as there are cases where the data distribution gradually changes over time due to various factors, e.g., the process of aging. Additionally, as the domain constantly evolves, new domains will continually emerge. Re-training and updating models with both new and previous domains using existing DG methods can be resource-intensive and inefficient.

continual domain generalization, evolving standardization, temporal drift, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.75)

Add feedback

Evolving Standardization for Continual Domain Generalization over Temporal Drift

Neural Information Processing SystemsOct-8-2025, 14:22:51 GMT

The capability of generalizing to out-of-distribution data is crucial for the deployment of machine learning models in the real world.

artificial intelligence, machine learning, statistics, (18 more...)

Neural Information Processing Systems

Country: Asia > China > Beijing > Beijing (0.04)

Genre: Instructional Material (0.67)

Industry: Education (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Evolving Standardization for Continual Domain Generalization over Temporal Drift

Neural Information Processing SystemsJan-15-2025, 11:42:34 GMT

The capability of generalizing to out-of-distribution data is crucial for the deployment of machine learning models in the real world. Existing domain generalization (DG) mainly embarks on offline and discrete scenarios, where multiple source domains are simultaneously accessible and the distribution shift among domains is abrupt and violent. Nevertheless, such setting may not be universally applicable to all real-world applications, as there are cases where the data distribution gradually changes over time due to various factors, e.g., the process of aging. Additionally, as the domain constantly evolves, new domains will continually emerge. Re-training and updating models with both new and previous domains using existing DG methods can be resource-intensive and inefficient.

artificial intelligence, continual domain generalization, machine learning, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.77)

Add feedback

Designing Multi-Step Action Models for Enterprise AI Adoption

Mishra, Shreyash, Shah, Shrey, Pereira, Rex

arXiv.org Artificial IntelligenceFeb-21-2024

This paper introduces the Multi-Step Action Model (MSAM), a closed-source AI model designed by Empsing to address challenges hindering AI adoption in enterprises. Through a holistic examination, this paper explores MSAM's foundational principles, design architecture, and future trajectory. It evaluates MSAM's performance via rigorous testing methodologies and envisions its potential impact on advancing AI adoption within organizations.

empsing, msam, multi-step action model, (13 more...)

arXiv.org Artificial Intelligence

2403.14645

Country:

Europe > United Kingdom (0.05)
North America > United States (0.04)

Genre: Research Report (0.82)

Industry: Information Technology > Security & Privacy (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.48)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.33)

Add feedback

Smart Textile-Driven Soft Spine Exosuit for Lifting Tasks in Industrial Applications

Zhu, Kefan, Sharma, Bibhu, Phan, Phuoc Thien, Davies, James, Thai, Mai Thanh, Hoang, Trung Thien, Nguyen, Chi Cong, Ji, Adrienne, Nicotra, Emanuele, Lovell, Nigel H., Do, Thanh Nho

arXiv.org Artificial IntelligenceFeb-3-2024

Work related musculoskeletal disorders (WMSDs) are often caused by repetitive lifting, making them a significant concern in occupational health. Although wearable assist devices have become the norm for mitigating the risk of back pain, most spinal assist devices still possess a partially rigid structure that impacts the user comfort and flexibility. This paper addresses this issue by presenting a smart textile actuated spine assistance robotic exosuit (SARE), which can conform to the back seamlessly without impeding the user movement and is incredibly lightweight. The SARE can assist the human erector spinae to complete any action with virtually infinite degrees of freedom. To detect the strain on the spine and to control the smart textile automatically, a soft knitting sensor which utilizes fluid pressure as sensing element is used. The new device is validated experimentally with human subjects where it reduces peak electromyography (EMG) signals of lumbar erector spinae by around 32 percent in loaded and around 22 percent in unloaded conditions. Moreover, the integrated EMG decreased by around 24.2 percent under loaded condition and around 23.6 percent under unloaded condition. In summary, the artificial muscle wearable device represents an anatomical solution to reduce the risk of muscle strain, metabolic energy cost and back pain associated with repetitive lifting tasks.

artificial muscle, lifting, lifting task, (17 more...)

arXiv.org Artificial Intelligence

2402.02319

Country:

Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > Iowa > Polk County > Des Moines (0.04)
North America > United States > Illinois > DuPage County > Elmhurst (0.04)
Asia > Vietnam > Hanoi > Hanoi (0.04)

Genre: Research Report (0.64)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Therapeutic Area > Musculoskeletal (1.00)

Technology:

Information Technology > Human Computer Interaction > Interfaces (0.68)
Information Technology > Artificial Intelligence > Robots (0.68)

Add feedback

Momentum-SAM: Sharpness Aware Minimization without Computational Overhead

Becker, Marlon, Altrock, Frederick, Risse, Benjamin

arXiv.org Artificial IntelligenceJan-22-2024

The recently proposed optimization algorithm for deep neural networks Sharpness Aware Minimization (SAM) suggests perturbing parameters before gradient calculation by a gradient ascent step to guide the optimization into parameter space regions of flat loss. While significant generalization improvements and thus reduction of overfitting could be demonstrated, the computational costs are doubled due to the additionally needed gradient calculation, making SAM unfeasible in case of limited computationally capacities. Motivated by Nesterov Accelerated Gradient (NAG) we propose Momentum-SAM (MSAM), which perturbs parameters in the direction of the accumulated momentum vector to achieve low sharpness without significant computational overhead or memory demands over SGD or Adam. We evaluate MSAM in detail and reveal insights on separable mechanisms of NAG, SAM and MSAM regarding training optimization and generalization. While artificial neural networks (ANNs) are typically trained by Empirical Risk Minimization (ERM), i.e., the minimization of a predefined loss function on a finite set of training data, the actual purpose is to generalize over this dataset and fit the model to the underlying data distribution. As a consequence, a fundamental challenge in designing network architectures and training procedures is to ensure the objective of ERM to be an adequate proxy for learning the underlying data distribution. One strategy to tackle this problem is to exploit the properties of the loss landscape of the parameter space on the training data. A strong link between the sharpness in this loss landscape and the models generalization capability has been proposed by Hochreiter & Schmidhuber (1994) and further analyzed in the work of Keskar et al. (2017). Following these works, Foret et al. (2021) proposed an algorithm to explicitly reduce the sharpness of loss minima and thereby improve the generalization performance, named Sharpness Aware Minimization (SAM). Built on top of gradient based optimizers such as SGD or Adam (Kingma & Ba, 2015), SAM searches for a loss maximum in a limited parameter vicinity for each optimization step and calculates the loss gradient at this ascended parameter position.

international conference, msam, perturbation, (15 more...)

arXiv.org Artificial Intelligence

2401.12033

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > Germany > North Rhine-Westphalia > Münster Region > Münster (0.04)
Europe > Germany > Berlin (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

mSAM: Micro-Batch-Averaged Sharpness-Aware Minimization

Behdin, Kayhan, Song, Qingquan, Gupta, Aman, Keerthi, Sathiya, Acharya, Ayan, Ocejo, Borja, Dexter, Gregory, Khanna, Rajiv, Durfee, David, Mazumder, Rahul

arXiv.org Machine LearningSep-30-2023

Modern deep learning models are over-parameterized, where different optima can result in widely varying generalization performance. The Sharpness-Aware Minimization (SAM) technique modifies the fundamental loss function that steers gradient descent methods toward flatter minima, which are believed to exhibit enhanced generalization prowess. Our study delves into a specific variant of SAM known as micro-batch SAM (mSAM). This variation involves aggregating updates derived from adversarial perturbations across multiple shards (micro-batches) of a mini-batch during training. We extend a recently developed and well-studied general framework for flatness analysis to theoretically show that SAM achieves flatter minima than SGD, and mSAM achieves even flatter minima than SAM. We provide a thorough empirical evaluation of various image classification and natural language processing tasks to substantiate this theoretical advancement. We also show that contrary to previous work, mSAM can be implemented in a flexible and parallelizable manner without significantly increasing computational costs. Our implementation of mSAM yields superior generalization performance across a wide range of tasks compared to SAM, further supporting our theoretical framework.

machine learning, msam, natural language, (17 more...)

arXiv.org Machine Learning

2302.09693

Country:

North America > United States > Massachusetts (0.04)
Europe > Belgium > Brussels-Capital Region > Brussels (0.04)

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Filters

Collaborating Authors

msam

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Momentum-SAM: Sharpness Aware Minimization without Computational Overhead

Momentum-SAM: Sharpness Aware Minimization without Computational Overhead

459a911eb49cd2e0192055ee156d04e5-Paper-Conference.pdf

Evolving Standardization for Continual Domain Generalization over Temporal Drift

Evolving Standardization for Continual Domain Generalization over Temporal Drift

Evolving Standardization for Continual Domain Generalization over Temporal Drift

Designing Multi-Step Action Models for Enterprise AI Adoption

Smart Textile-Driven Soft Spine Exosuit for Lifting Tasks in Industrial Applications

Momentum-SAM: Sharpness Aware Minimization without Computational Overhead

mSAM: Micro-Batch-Averaged Sharpness-Aware Minimization