CMoE: Fast Carving of Mixture-of-Experts for Efficient LLM Inference

Open in new window