$μ$-Parametrization for Mixture of Experts