CARL-G: Clustering-Accelerated Representation Learning on Graphs
Shiao, William, Saini, Uday Singh, Liu, Yozen, Zhao, Tong, Shah, Neil, Papalexakis, Evangelos E.
–arXiv.org Artificial Intelligence
Self-supervised learning on graphs has made large strides in achieving great performance in various downstream tasks. However, many state-of-the-art methods suffer from a number of impediments, which prevent them from realizing their full potential. For instance, contrastive methods typically require negative sampling, which is often computationally costly. While non-contrastive methods avoid this expensive step, most existing methods either rely on overly complex architectures or dataset-specific augmentations. In this paper, we ask: Can we borrow from classical unsupervised machine learning literature in order to overcome those obstacles? Guided by our key insight that the goal of distance-based clustering closely resembles that of contrastive learning: both attempt to pull representations of similar items together and dissimilar items apart. As a result, we propose CARL-G - a novel clustering-based framework for graph representation learning that uses a loss inspired by Cluster Validation Indices (CVIs), i.e., internal measures of cluster quality (no ground truth required). CARL-G is adaptable to different clustering methods and CVIs, and we show that with the right choice of clustering method and CVI, CARL-G outperforms node classification baselines on 4/5 datasets with up to a 79x training speedup compared to the best-performing baseline. CARL-G also performs at par or better than baselines in node clustering and similarity search tasks, training up to 1,500x faster than the best-performing baseline. Finally, we also provide theoretical foundations for the use of CVI-inspired losses in graph representation learning.
arXiv.org Artificial Intelligence
Jul-31-2023
- Country:
- Asia > Singapore (0.04)
- Oceania > Australia
- Queensland (0.04)
- North America > United States
- Washington > King County
- Seattle (0.04)
- New York > New York County
- New York City (0.06)
- New Jersey > Essex County
- Newark (0.04)
- Massachusetts > Middlesex County
- Cambridge (0.05)
- California
- Riverside County > Riverside (0.14)
- San Francisco County > San Francisco (0.14)
- San Diego County > San Diego (0.04)
- Los Angeles County
- Long Beach (0.05)
- Santa Monica (0.04)
- Washington > King County
- Europe
- France (0.04)
- United Kingdom > England
- Greater London > London (0.04)
- Spain > Galicia
- Madrid (0.04)
- Italy > Emilia-Romagna
- Metropolitan City of Bologna > Bologna (0.04)
- Genre:
- Research Report > Promising Solution (0.66)
- Industry:
- Technology: