A second-order-like optimizer with adaptive gradient scaling for deep learning

Bolte, Jérôme, Boustany, Ryan, Pauwels, Edouard, Purica, Andrei

Dec-12-2024–arXiv.org Artificial Intelligence

In this empirical article, we introduce INNAprop, an optimization algorithm that combines the INNA method with the RMSprop adaptive gradient scaling. After giving geometrical insights, we evaluate INNAprop on CIFAR-10, Food101, and ImageNet with ResNets, VGG, DenseNet, and ViT, and on GPT-2 (OpenWeb-Text) train from scratch and with LoRA fine-tuning (E2E). INNAprop consistently matches or outperforms AdamW both in training speed and accuracy, with minimal hyperparameter tuning in large-scale settings. As deep learning models grow in size, massive computational resources are needed for training, representing significant challenges in terms of financial costs, energy consumption, and processing time (Susnjak et al., 2024; Varoquaux et al., 2024). According to the UN's Environment Programme Training, the Big Tech sector produced between two and three percent of the world's carbon emissions in 2021; some estimations for the year 2023 go beyond 4%, see the latest Stand.earth For instance, training GPT-3 is estimated to require 1,287 megawatt-hours (MWh) of electricity, equivalent to the annual usage of over 100 U.S. households (Anthony et al., 2020; Patterson et al., 2021). Similarly, the financial cost of specialized hardware and cloud computing is extremely high. OpenAI claimed that the training cost for GPT-4 (Achiam et al., 2023) exceeded 100 million dollars.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

Dec-12-2024

arXiv.org PDF

Add feedback

Country:
- Europe > Switzerland > Zürich > Zürich (0.14)

Genre:
- Research Report (0.81)

Industry:
- Energy (0.74)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)
  - Natural Language (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found