On Model Parallelization and Scheduling Strategies for Distributed Machine Learning