Communication-minimizing Asynchronous Tensor Parallelism

Singh, Siddharth, Sating, Zack, Bhatele, Abhinav

arXiv.org Artificial Intelligence 

In this work, we propose Tensor3D, a three dimensional (3D) hybrid tensor and data parallel framework which strives to alleviate As state-of-the-art neural networks scale to billions of parameters, the aforementioned performance bottlenecks of existing tensor designing parallel algorithms that can train these networks parallel approaches. Our framework relies on three key ideas to efficiently on multi-GPU clusters has become critical. This paper minimize the idle time spent in communication. First, we show how presents Tensor3D, a novel three-dimensional (3D) approach to a naive application of a tensor parallel strategy can lead to a significant parallelize tensor computations, that strives to minimize the idle amount of communication for satisfying the data dependencies time incurred due to communication in parallel training of large of parallelized layers of a neural network. To this end, we propose multi-billion parameter models. First, we introduce an intelligent an intelligent distribution of neural network parameters across distribution of neural network parameters across GPUs that eliminates GPUs that eliminates the aforementioned communication for satisfying communication required for satisfying data dependencies of data dependencies.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found