Stacking Y our Transformers: A Closer Look at Model Growth for Efficient LLM Pre-Training