Azure empowers easy-to-use, high-performance, and hyperscale model training using DeepSpeed
This blog was written in collaboration with the DeepSpeed team, the Azure ML team, and the Azure HPC team at Microsoft. Large-scale transformer-based deep learning models trained on large amounts of data have shown great results in recent years in several cognitive tasks and are behind new products and features that augment human capabilities. These models have grown several orders of magnitude in size during the last five years. Starting from a few million parameters of the original transformer model all the way to the latest 530 billion-parameter Megatron-Turing (MT-NLG 530B) model as shown in Figure 1. There is a growing need for customers to train and fine-tune large models at an unprecedented scale.
Jul-26-2022, 17:07:22 GMT
- Technology: