HyperRouter: Towards Efficient Training and Inference of Sparse Mixture of Experts