Activations and Gradients Compression for Model-Parallel Training

Open in new window