Towards Scalable Distributed Training of Deep Learning on Public Cloud Clusters