Warmstarting for Scaling Language Models