Improving Generalization of Pre-trained Language Models via Stochastic Weight Averaging

Open in new window