$\mu$nit Scaling: Simple and Scalable FP8 LLM Training

Open in new window