Initialization of Large Language Models via Reparameterization to Mitigate Loss Spikes

Open in new window