Mixture-of-Transformers Learn Faster: A Theoretical Study on Classification Problems

Open in new window