AMP: Automatically Finding Model Parallel Strategies with Heterogeneity Awareness

Neural Information Processing Systems 

Scaling up model sizes can lead to fundamentally new capabilities in many machine learning (ML) tasks. However, training big models requires strong distributed system expertise to carefully design model-parallel execution strategies that suit the model architectures and cluster setups.