Goto

Collaborating Authors

 megatronlm


Astra: Efficient and Money-saving Automatic Parallel Strategies Search on Heterogeneous GPUs

arXiv.org Artificial Intelligence

In this paper, we introduce an efficient and money-saving automatic parallel strategies search framework on heterogeneous GPUs: Astra. First, Astra searches for the efficiency-optimal parallel strategy in both GPU configurations search space (GPU types and GPU numbers) and parallel parameters search space. Then, Astra also provides the solution on heterogeneous GPUs by mathematically modeling the time consumption of heterogeneous training. At last, Astra is the first to propose the automatic parallel strategy search on money-saving. The experiment results demonstrate that Astra can achieve better throughput than expert-designed strategies. The search time cost for Astra can also be limited to 1.27 seconds in a single-GPU setting and less than 1.35 minutes in a heterogeneous-GPU setting on average with an accuracy of over 95%.


Energy consumption of AI poses environmental problems

#artificialintelligence

Take some of the most popular language models, for example. OpenAI trained its GPT-3 model on 45 terabytes of data. To train the final version of MegatronLM, a language model similar to but smaller than GPT-3, Nvidia ran 512 V100 GPUs over nine days. A single V100 GPU can consume between 250 and 300 watts. If we assume 250 watts, then 512 V100 GPUS consumes 128,000 watts, or 128 kilowatts (kW).


Tiny AI models could supercharge autocorrect and voice assistants on your phone

#artificialintelligence

Researchers have successfully shrunk a giant language model to use in commercial applications. In October of last year, for example, Google released a model called BERT that passed a long-held reading-comprehension benchmark in the field. The larger version of the model had 340 million data parameters, and training it just one time through cost enough electricity to power a US household for 50 days. Four months later, OpenAI quickly topped it with its model GPT-2. The model demonstrated an impressive knack for constructing convincing prose; it also used 1.5 billion parameters. Now, MegatronLM, the latest and largest model from Nvidia, has 8.3 billion parameters.