Stable Language Model Pre-training by Reducing Embedding Variability

Open in new window