zero-3 offload
Microsoft Releases AI Training Library ZeRO-3 Offload
Microsoft recently open-sourced ZeRO-3 Offload, an extension of their DeepSpeed AI training library that improves memory efficiency while training very large deep-learning models. ZeRO-3 Offload allows users to train models with up to 40 billion parameters on a single GPU and over 2 trillion parameters on 512 GPUs. The DeepSpeed team provided an overview of the features and benefits of the release in a recent blog post. ZeRO-3 Offload increases the memory efficiency of distributed training for deep-learning models built on the PyTorch framework, providing super-linear scaling across multiple GPUs. By offloading the storage of some data from the GPU to the CPU, larger model sizes per GPU can be trained, enabling model sizes up to 40B parameters on a single GPU.