MARS: Exploiting Multi-Level Parallelism for DNN Workloads on Adaptive Multi-Accelerator Systems
Shen, Guan, Zhao, Jieru, Wang, Zeke, Lin, Zhe, Ding, Wenchao, Wu, Chentao, Chen, Quan, Guo, Minyi
–arXiv.org Artificial Intelligence
Along with the fast evolution of deep neural networks, the hardware system is also developing rapidly. As a promising solution achieving high scalability and low manufacturing cost, multi-accelerator systems widely exist in data centers, cloud platforms, and SoCs. Thus, a challenging problem arises in multi-accelerator systems: selecting a proper combination of accelerators from available designs and searching for efficient DNN mapping strategies. To this end, we propose MARS, a novel mapping framework that can perform computation-aware accelerator selection, and apply communication-aware sharding strategies to maximize parallelism. Experimental results show that MARS can achieve 32.2% latency reduction on average for typical DNN workloads compared to the baseline, and 59.4% latency reduction on heterogeneous models compared to the corresponding state-of-the-art method.
arXiv.org Artificial Intelligence
Jul-23-2023
- Genre:
- Research Report > Promising Solution (0.54)
- Industry:
- Information Technology > Services (0.68)
- Technology: