SupplementaryMaterial AutoSync: LearningtoSynchronizeforData-Parallel DistributedDeepLearning

Neural Information Processing Systems 

Asδ is relatively slow compared with other time consumption, we treat it asaconstant that does not scale with variable size. The second process introduces network overhead (e.g.,latency) and network communication. For vi VCC, we model 5 mostly used collective primitives:AllReduce, ReduceScatter, AllGather, Broadcast and Reduce [12]. I1,I2,I3 are true whenAllReduce, ReduceScatter and AllGather, Broadcast and Reduce are activated, respectively. Note thattheLSTM walks through eachvi V0G,θ strictly following their original forward (backward) order in the computational graph, so as to inject this information intothemodeling. A global load balancer (clb) and group assigner (cam) assign their values using randomized and approximate solutions, illustrated in Algorithm 1 and Algorithm 2, respectively.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found