Dynamic Expert Quantization for Scalable Mixture-of-Experts Inference

Open in new window