Towards Understanding the Mixture-of-Experts Layer in Deep Learning

Open in new window