A Comparison with official DTR and DTE implementations ResNet-152 3.0 10 3.0 8

Neural Information Processing Systems 

Figure 1: Training time with increase of batch size under Coop (implemented in OneFlow), MegEngine DTE (official implementation of DTE in MegEngine) and PyTorch DTR (official implementation of DTR in PyTorch). The bars with slashes represent the out of memory (OOM) error. Coop saves more memory and supports the training of larger models. Figure 2: Comparison of compute overhead evaluated on Coop when one of the three modules is removed. Figure 3: Comparison of the averaged memory fragmentation rate evaluated on Coop when one of the three modules is removed.