Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models

Open in new window