Toward Inference-optimal Mixture-of-Expert Large Language Models

Open in new window