Exascale Deep Learning for Scientific Inverse Problems

Laanait, Nouamane, Romero, Joshua, Yin, Junqi, Young, M. Todd, Treichler, Sean, Starchenko, Vitalii, Borisevich, Albina, Sergeev, Alex, Matheson, Michael

arXiv.org Machine Learning 

We introduce novel communication strategies in synchronous distributed Deep Learning consisting of decentralized gradient reduction orchestration and computational graph-aware grouping of gradient tensors. Networks (DNN) models and data sets (Dai et al., 2019), the need for efficient distributed machine learning strategies on massively parallel systems is more significant than On small to moderate-scale systems, with 10's - 100's of GPU/TPU accelerators, these scaling inefficiencies can be difficult to detect and systematically optimize due to system noise and load variability. The scaling inefficiencies of data-parallel implementations are most readily apparent on large-scale systems such as supercomputers with 1,000's-10,000's of accelerators. Extending data-parallelism to the massive scale of super-computing systems is also motivated by the latter's traditional workload consisting of scientific numerical simulations (Kent & Kotliar, 2018). NVLink interconnect, supporting a (peak) bidirectional bandwidth of 100 GB/s, where each 3 V100 GPUs are grouped in a ring topology with all-to-all connections to a POWER9 CPU.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found