Stanza: Layer Separation for Distributed Training in Deep Learning