Mesh-TensorFlow: Deep Learning for Supercomputers