A Linear Algebraic Approach to Model Parallelism in Deep Learning