Improving Routing in Sparse Mixture of Experts with Graph of Tokens