Variance Control via Weight Rescaling in LLM Pre-training