Automatic Expert Discovery in LLM Upcycling via Sparse Interpolated Mixture-of-Experts

Open in new window