On the effectiveness of discrete representations in sparse mixture of experts