SlipStream: Adapting Pipelines for Distributed Training of Large DNNs Amid Failures

Open in new window