Towards A Unified View of Sparse Feed-Forward Network in Pretraining Large Language Model