Foundation Transformers

Wang, Hongyu, Ma, Shuming, Huang, Shaohan, Dong, Li, Wang, Wenhui, Peng, Zhiliang, Wu, Yu, Bajaj, Payal, Singhal, Saksham, Benhaim, Alon, Patra, Barun, Liu, Zhun, Chaudhary, Vishrav, Song, Xia, Wei, Furu

Oct-19-2022–arXiv.org Artificial Intelligence

A big convergence of model architectures across language, vision, speech, and multimodal is emerging. However, under the same name "Transformers", the above areas use different implementations for better performance, e.g., Post-LayerNorm for BERT, and Pre-LayerNorm for GPT and vision Transformers. We call for the development of Foundation Transformer for true general-purpose modeling, which serves as a go-to architecture for various tasks and modalities with guaranteed training stability. In this work, we introduce a Transformer variant, named Magneto, to fulfill the goal. Specifically, we propose Sub-LayerNorm for good expressivity, and the initialization strategy theoretically derived from DeepNet for stable scaling up. Extensive experiments demonstrate its superior performance and better stability than the de facto Transformer variants designed for various applications, including language modeling (i.e., BERT, and GPT), machine translation, vision pretraining (i.e., BEiT), speech recognition, and multimodal pretraining (i.e., BEiT-3).

agneto, large language model, machine learning, (22 more...)

arXiv.org Artificial Intelligence

Oct-19-2022

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Hawaii > Honolulu County > Honolulu (0.04)
- Europe
  - Spain > Catalonia
    - Barcelona Province > Barcelona (0.04)
  - Italy > Tuscany
    - Florence (0.04)
  - Austria > Styria
    - Graz (0.04)
- Asia > Japan
  - Kyūshū & Okinawa > Okinawa (0.04)

Genre:
- Research Report (0.64)

Technology:
- Information Technology > Artificial Intelligence
  - Vision (1.00)
  - Representation & Reasoning (1.00)
  - Natural Language
    - Large Language Model (1.00)
    - Chatbot (0.87)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found