Exascale Deep Learning for Scientific Inverse Problems
Laanait, Nouamane, Romero, Joshua, Yin, Junqi, Young, M. Todd, Treichler, Sean, Starchenko, Vitalii, Borisevich, Albina, Sergeev, Alex, Matheson, Michael
We introduce novel communication strategies in synchronous distributed Deep Learning consisting of decentralized gradient reduction orchestration and computational graph-aware grouping of gradient tensors. Networks (DNN) models and data sets (Dai et al., 2019), the need for efficient distributed machine learning strategies on massively parallel systems is more significant than On small to moderate-scale systems, with 10's - 100's of GPU/TPU accelerators, these scaling inefficiencies can be difficult to detect and systematically optimize due to system noise and load variability. The scaling inefficiencies of data-parallel implementations are most readily apparent on large-scale systems such as supercomputers with 1,000's-10,000's of accelerators. Extending data-parallelism to the massive scale of super-computing systems is also motivated by the latter's traditional workload consisting of scientific numerical simulations (Kent & Kotliar, 2018). NVLink interconnect, supporting a (peak) bidirectional bandwidth of 100 GB/s, where each 3 V100 GPUs are grouped in a ring topology with all-to-all connections to a POWER9 CPU.
Sep-24-2019
- Country:
- Europe > Italy
- Calabria > Catanzaro Province > Catanzaro (0.04)
- North America > United States
- California > Santa Clara County
- Santa Clara (0.04)
- New Jersey > Middlesex County
- Piscataway (0.04)
- Tennessee > Anderson County
- Oak Ridge (0.04)
- California > Santa Clara County
- South America > Suriname
- Marowijne District > Albina (0.04)
- Europe > Italy
- Genre:
- Research Report (0.82)
- Industry:
- Technology: