AITopics | train transformer

Collaborating Authors

train transformer

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Mnemosyne: Learning to Train Transformers with Transformers

Neural Information Processing SystemsDec-27-2025, 05:27:32 GMT

In this work, we propose a new class of learnable optimizers, called Mnemosyne. It is based on the novel spatio-temporal low-rank implicit attention Transformers that can learn to train entire neural network architectures, including other Transformers, without any task-specific optimizer tuning. We show that Mnemosyne: (a) outperforms popular LSTM optimizers (also with new feature engineering to mitigate catastrophic forgetting of LSTMs), (b) can successfully train Transformers while using simple meta-training strategies that require minimal computational resources, (c) matches accuracy-wise SOTA hand-designed optimizers with carefully tuned hyper-parameters (often producing top performing models). Furthermore, Mnemosyne provides space complexity comparable to that of its hand-designed first-order counterparts, which allows it to scale to training larger sets of parameters. We conduct an extensive empirical evaluation of Mnemosyne on: (a) fine-tuning a wide range of Vision Transformers (ViTs) from medium-size architectures to massive ViT-Hs (36 layers, 16 heads), (b) pre-training BERT models and (c) soft prompt-tuning large 11B+ T5XXL models. We complement our results with a comprehensive theoretical analysis of the compact associative memory used by Mnemosyne which we believe was never done before.

mnemosyne, train transformer, transformer, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Mnemosyne: Learning to Train Transformers with Transformers

Neural Information Processing SystemsJan-20-2025, 02:37:15 GMT

In this work, we propose a new class of learnable optimizers, called Mnemosyne. It is based on the novel spatio-temporal low-rank implicit attention Transformers that can learn to train entire neural network architectures, including other Transformers, without any task-specific optimizer tuning. We show that Mnemosyne: (a) outperforms popular LSTM optimizers (also with new feature engineering to mitigate catastrophic forgetting of LSTMs), (b) can successfully train Transformers while using simple meta-training strategies that require minimal computational resources, (c) matches accuracy-wise SOTA hand-designed optimizers with carefully tuned hyper-parameters (often producing top performing models). Furthermore, Mnemosyne provides space complexity comparable to that of its hand-designed first-order counterparts, which allows it to scale to training larger sets of parameters. We conduct an extensive empirical evaluation of Mnemosyne on: (a) fine-tuning a wide range of Vision Transformers (ViTs) from medium-size architectures to massive ViT-Hs (36 layers, 16 heads), (b) pre-training BERT models and (c) soft prompt-tuning large 11B T5XXL models. We complement our results with a comprehensive theoretical analysis of the compact associative memory used by Mnemosyne which we believe was never done before.

mnemosyne, optimizer, transformer, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Scaling Training of HuggingFace Transformers With Determined

#artificialintelligenceJun-18-2021, 01:35:11 GMT

Training complex state-of-the-art natural language processing (NLP) models is now a breeze, thanks to HuggingFace -- making it an essential open-source go-to for data scientists and machine learning engineers to implement Transformers models and configure them as state-of-the-art NLP models with straightforward library calls. As a result, the library has become crucial for training NLP models, like in Baidu or Alibaba, and has contributed to state-of-the-art results in several NLP tasks. Our friends at Determined AI are hosting an exciting lunch-and-learn covering training HuggingFace Transformers at scale using Determined! Learn to train Transformers with distributed training, hyperparameter searches, and cheap spot instances -- all without modifying code. Please consider joining on Wednesday, June 30th at 10 AM PT for a hands-on tutorial from Liam Li, a Senior Machine Learning Engineer at Determined AI, and Angela Jiang, a Product Manager at Determined AI (lunch included!).

determined, huggingface transformer, transformer, (5 more...)

#artificialintelligence

Genre: Instructional Material (0.64)

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback