Spike No More: Stabilizing the Pre-training of Large Language Models

Open in new window