Static Batching of Irregular Workloads on GPUs: Framework and Application to Efficient MoE Model Inference

Open in new window