GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism

Yanping Huang, Youlong Cheng, Ankur Bapna, Orhan Firat, Dehao Chen, Mia Chen, HyoukJoong Lee, Jiquan Ngiam, Quoc V. Le, Yonghui Wu, zhifeng Chen

Neural Information Processing Systems 

Inmany cases, increasing model capacity beyond the memory limit of a single acceleratorhas required developing special algorithms orinfrastructure. These solutions are often architecture-specific and do not transfer to other tasks.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found