Retraining-Free Merging of Sparse Mixture-of-Experts via Hierarchical Clustering

Open in new window