Crown, Frame, Reverse: Layer-Wise Scaling Variants for LLM Pre-Training

Open in new window