DistTGL: Distributed Memory-Based Temporal Graph Neural Network Training

Zhou, Hongkuan, Zheng, Da, Song, Xiang, Karypis, George, Prasanna, Viktor

Jul-14-2023–arXiv.org Artificial Intelligence

Memory-based Temporal Graph Neural Networks are powerful tools in dynamic graph representation learning and have demonstrated superior performance in many real-world applications. However, their node memory favors smaller batch sizes to capture more dependencies in graph events and needs to be maintained synchronously across all trainers. As a result, existing frameworks suffer from accuracy loss when scaling to multiple GPUs. Evenworse, the tremendous overhead to synchronize the node memory make it impractical to be deployed to distributed GPU clusters. In this work, we propose DistTGL -- an efficient and scalable solution to train memory-based TGNNs on distributed GPU clusters. DistTGL has three improvements over existing solutions: an enhanced TGNN model, a novel training algorithm, and an optimized system. In experiments, DistTGL achieves near-linear convergence speedup, outperforming state-of-the-art single-machine method by 14.5% in accuracy and 10.17x in training throughput.

artificial intelligence, machine learning, node memory, (18 more...)

arXiv.org Artificial Intelligence

Jul-14-2023

arXiv.org PDF

Add feedback

Country:
- Europe > United Kingdom (0.04)
- Asia > China (0.04)
- Oceania > Australia
  - New South Wales > Sydney (0.04)
- North America > United States
  - District of Columbia > Washington (0.04)
  - New York > New York County
    - New York City (0.04)
  - California
    - Santa Clara County > Santa Clara (0.14)
    - Los Angeles County > Los Angeles (0.14)
  - Alaska > Anchorage Municipality
    - Anchorage (0.04)

Genre:
- Research Report (0.40)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found