Exascale Deep Learning for Scientific Inverse Problems

Laanait, Nouamane, Romero, Joshua, Yin, Junqi, Young, M. Todd, Treichler, Sean, Starchenko, Vitalii, Borisevich, Albina, Sergeev, Alex, Matheson, Michael

Sep-24-2019–arXiv.org Machine Learning

We introduce novel communication strategies in synchronous distributed Deep Learning consisting of decentralized gradient reduction orchestration and computational graph-aware grouping of gradient tensors. Networks (DNN) models and data sets (Dai et al., 2019), the need for efficient distributed machine learning strategies on massively parallel systems is more significant than On small to moderate-scale systems, with 10's - 100's of GPU/TPU accelerators, these scaling inefficiencies can be difficult to detect and systematically optimize due to system noise and load variability. The scaling inefficiencies of data-parallel implementations are most readily apparent on large-scale systems such as supercomputers with 1,000's-10,000's of accelerators. Extending data-parallelism to the massive scale of super-computing systems is also motivated by the latter's traditional workload consisting of scientific numerical simulations (Kent & Kotliar, 2018). NVLink interconnect, supporting a (peak) bidirectional bandwidth of 100 GB/s, where each 3 V100 GPUs are grouped in a ring topology with all-to-all connections to a POWER9 CPU.

exascale deep learning, execution, opération, (12 more...)

arXiv.org Machine Learning

Sep-24-2019

arXiv.org PDF

Add feedback

Country:
- South America > Suriname
  - Marowijne District > Albina (0.04)
- North America > United States
  - Tennessee > Anderson County
    - Oak Ridge (0.04)
  - New Jersey > Middlesex County
    - Piscataway (0.04)
  - California > Santa Clara County
    - Santa Clara (0.04)
- Europe > Italy
  - Calabria > Catanzaro Province > Catanzaro (0.04)

Genre:
- Research Report (0.82)

Industry:
- Energy (0.94)
- Government > Regional Government
  - North America Government > United States Government (1.00)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found