$μ$-Parametrization for Mixture of Experts

Open in new window