Accelerating Transformer Pre-training with 2:4 Sparsity

Open in new window