MassiveGNN: Efficient Training via Prefetching for Massively Connected Distributed Graphs

Sarkar, Aishwarya, Ghosh, Sayan, Tallent, Nathan R., Jannesari, Ali

Nov-3-2024–arXiv.org Artificial Intelligence

Graph Neural Networks (GNN) are indispensable in learning from graph-structured data, yet their rising computational costs, especially on massively connected graphs, pose significant challenges in terms of execution performance. To tackle this, distributed-memory solutions such as partitioning the graph to concurrently train multiple replicas of GNNs are in practice. However, approaches requiring a partitioned graph usually suffer from communication overhead and load imbalance, even under optimal partitioning and communication strategies due to irregularities in the neighborhood minibatch sampling. This paper proposes practical trade-offs for improving the sampling and communication overheads for representation learning on distributed graphs (using popular GraphSAGE architecture) by developing a parameterized continuous prefetch and eviction scheme on top of the state-of-the-art Amazon DistDGL distributed GNN framework, demonstrating about 15-40% improvement in end-to-end training performance on the National Energy Research Scientific Computing Center's (NERSC) Perlmutter supercomputer for various OGB datasets.

artificial intelligence, machine learning, node, (19 more...)

arXiv.org Artificial Intelligence

Nov-3-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States > Iowa (0.14)

Genre:
- Research Report (1.00)

Industry:
- Energy (0.34)
- Government (0.46)

Technology:
- Information Technology
  - Artificial Intelligence > Machine Learning
    - Neural Networks (0.68)
  - Data Science (0.93)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found