RapidGNN: Energy and Communication-Efficient Distributed Training on Large-Scale Graph Neural Networks

Niam, Arefin, Kosar, Tevfik, Nine, M S Q Zulkar

Sep-8-2025–arXiv.org Artificial Intelligence

Graph Neural Networks (GNNs) have become popular across a diverse set of tasks in exploring structural relationships between entities. However, due to the highly connected structure of the datasets, distributed training of GNNs on large-scale graphs poses significant challenges. Traditional sampling-based approaches mitigate the computational loads, yet the communication overhead remains a challenge. This paper presents RapidGNN, a distributed GNN training framework with deterministic sampling-based scheduling to enable efficient cache construction and prefetching of remote features. Evaluation on benchmark graph datasets demonstrates RapidGNN's effectiveness across different scales and topologies. RapidGNN improves end-to-end training throughput by 2.46x to 3.00x on average over baseline methods across the benchmark datasets, while cutting remote feature fetches by over 9.70x to 15.39x. RapidGNN further demonstrates near-linear scalability with an increasing number of computing units efficiently. Furthermore, it achieves increased energy efficiency over the baseline methods for both CPU and GPU by 44% and 32%, respectively.

artificial intelligence, machine learning, rapidgnn, (15 more...)

arXiv.org Artificial Intelligence

Sep-8-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States > New York (0.28)

Genre:
- Research Report (1.00)

Industry:
- Health & Medicine
  - Pharmaceuticals & Biotechnology (0.46)
  - Therapeutic Area (0.46)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)