MoE-Gen: High-Throughput MoE Inference on a Single GPU with Module-Based Batching

Open in new window