Diversifying the Mixture-of-Experts Representation for Language Models with Orthogonal Optimizer

Open in new window