Scaling Diffusion Transformers Efficiently via $μ$P

Open in new window