Accelerating Communication in Deep Learning Recommendation Model Training with Dual-Level Adaptive Lossy Compression
Feng, Hao, Zhang, Boyuan, Ye, Fanjiang, Si, Min, Chu, Ching-Hsiang, Tian, Jiannan, Yin, Chunxing, Deng, Summer, Hao, Yuchen, Balaji, Pavan, Geng, Tong, Tao, Dingwen
–arXiv.org Artificial Intelligence
Abstract--DLRM is a state-of-the-art recommendation system model that has gained widespread adoption across various industry applications. This setup necessitates the use of collective communication primitives for Deep Learning Recommendation Models (DLRMs) have synchronization across all GPUs. Specifically, the partitioning significantly risen to prominence in both research and industry of sparse embedding tables requires nodes to aggregate sparse sectors in recent years. These models integrate sparse input embedding lookups during forward passes and their corresponding embedding learning with neural network architectures, marking gradients during backward passes. Consequently, allto-all a notable advance over traditional collaborative filteringbased communication is utilized in both forward and backward recommendation systems [1]. DLRMs have been successfully passes for synchronizing sparse lookups and gradients, while implemented in various industry applications, including all-reduce is employed for synchronizing dense/MLP gradients product recommendations system by Amazon [2], personalized during the backward pass. As a result, they constitute a significant portion gradients across all GPUs during each minibatch iteration significantly of deep learning applications across multiple industries. For example, DLRMs are uniquely designed to process high-dimensional Figure 1 shows that all-to-all communication accounts for categorical features, typically represented by one-or multihot more than 60% of the total training time for DLRM on an vectors matching the size of the category, which leads to 8-node, 32 A100 GPUs cluster (connected through a Slingshot significant data sparsity.
arXiv.org Artificial Intelligence
Jul-11-2024