Capacity-Aware Inference: Mitigating the Straggler Effect in Mixture of Experts

Open in new window