Accelerating Transformer Inference and Training with 2:4 Activation Sparsity

Open in new window