Exploring the Benefit of Activation Sparsity in Pre-training