Accelerating MoE Model Inference with Expert Sharding

Open in new window