Improving Routing in Sparse Mixture of Experts with Graph of Tokens

Open in new window