GPAS: Accelerating Convergence of LLMPretraining via Gradient-Preserving Activation Scaling

Open in new window