Towards A Unified View of Sparse Feed-Forward Network in Pretraining Large Language Model

Open in new window