Rethinking and Accelerating Graph Condensation: A Training-Free Approach with Class Partition

Gao, Xinyi, Chen, Tong, Zhang, Wentao, Yu, Junliang, Ye, Guanhua, Nguyen, Quoc Viet Hung, Yin, Hongzhi

May-22-2024–arXiv.org Artificial Intelligence

The increasing prevalence of large-scale graphs poses a significant challenge for graph neural network training, attributed to their substantial computational requirements. In response, graph condensation (GC) emerges as a promising datacentric solution aiming to substitute the large graph with a small yet informative condensed graph to facilitate data-efficient GNN training. However, existing GC methods suffer from intricate optimization processes, necessitating excessive computing resources and training time. In this paper, we revisit existing GC optimization strategies and identify two pervasive issues therein: (1) various GC optimization strategies converge to class-level node feature matching between the original and condensed graphs, making the optimization target coarse-grained despite the complex computations; (2) to bridge the original and condensed graphs, existing GC methods rely on a Siamese graph network architecture that requires time-consuming bi-level optimization with iterative gradient computations. To overcome these issues, we propose an efficient, training-free GC framework termed Class-partitioned Graph Condensation (CGC), which refines the node feature matching from the class-to-class paradigm into a novel class-to-node paradigm. Remarkably, this refinement also simplifies the GC optimization as a class partition problem, which can be efficiently solved by any clustering methods. Moreover, CGC incorporates a pre-defined graph structure to enable a closed-form solution for condensed node features, eliminating the need for back-and-forth gradient descent in existing GC approaches without sacrificing accuracy. Extensive experiments demonstrate that CGC achieves state-of-the-art performance with a more efficient condensation process. For instance, compared with the seminal GC method (i.e., GCond), CGC condenses the largest Reddit graph within 10 seconds, achieving a 2,680 speedup and a 1.4% accuracy increase.

artificial intelligence, machine learning, optimization problem, (18 more...)

arXiv.org Artificial Intelligence

May-22-2024

arXiv.org PDF

Add feedback

Country:
- Europe (0.67)
- North America > United States
  - California > Los Angeles County > Long Beach (0.14)

Genre:
- Research Report (0.50)

Industry:
- Information Technology (0.47)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning
    - Neural Networks (0.87)
    - Statistical Learning > Clustering (0.34)
  - Representation & Reasoning > Optimization (0.93)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found