Accelerate Model Parallel Training by Using Efficient Graph Traversal Order in Device Placement