Static Batching of Irregular Workloads on GPUs: Framework and Application to Efficient MoE Model Inference

Li, Yinghan, Li, Yifei, Zhang, Jiejing, Chen, Bujiao, Chen, Xiaotong, Duan, Lian, Jin, Yejun, Li, Zheng, Liu, Xuanyu, Wang, Haoyu, Wang, Wente, Wang, Yajie, Yang, Jiacheng, Zhang, Peiyang, Zheng, Laiwen, Yu, Wenyuan

Jan-27-2025–arXiv.org Artificial Intelligence

Resource utilization is one of the key factors in fully exploiting the computing power of massively parallel devices, including GPUs. As a common method to improve utilization and reduce overhead, the benefit of the batching technique should never be underestimated [7, 8, 11]. In most cases, it is handy to batch regular workloads that share the same type and size, which also have similar amounts of computation and memory access. For example, in the CUDA programming model, this kind of regular workloads can be conveniently batched along an additional thread block or grid dimension [15]. However, irregular workloads do not naturally fit into this scheme. Irregular workloads may show one or more of the following characteristics that prevent regular batching[1]: variable amounts of computation, special memory access patterns, control flow divergence, etc. Moreover, heterogeneous workloads almost raise the difficulty of batching to an unreachable level. Here, by heterogeneous, we refer to workloads of different types of operations, e.g., some of the workloads are reduction, while others are element-wise operations. Irregular workloads are often managed in a task-parallel fashion instead of batching, where an individual workload is regarded as a task, and all tasks are dynamically scheduled [1, 19].

large language model, machine learning, workload, (22 more...)

arXiv.org Artificial Intelligence

Jan-27-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - New York > New York County > New York City (0.05)
- Europe > France
  - Auvergne-Rhône-Alpes > Lyon > Lyon (0.04)

Genre:
- Research Report (0.41)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (0.69)
  - Machine Learning > Neural Networks
    - Deep Learning (0.47)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found