Exploring the Benefit of Activation Sparsity in Pre-training

Open in new window