Optimizing Distributed Deployment of Mixture-of-Experts Model Inference in Serverless Computing

Open in new window