Accelerating Transformer Pre-training with 2:4 Sparsity