A Graph is Worth $K$ Words: Euclideanizing Graph using Pure Transformer
Gao, Zhangyang, Dong, Daize, Tan, Cheng, Xia, Jun, Hu, Bozhen, Li, Stan Z.
–arXiv.org Artificial Intelligence
Can we model non-Euclidean graphs as pure language or even Euclidean vectors while retaining their inherent information? The non-Euclidean property have posed a long term challenge in graph modeling. Despite recent GNN and Graphformer efforts encoding graphs as Euclidean vectors, recovering original graph from the vectors remains a challenge. We introduce GraphsGPT, featuring a Graph2Seq encoder that transforms non-Euclidean graphs into learnable graph words in a Euclidean space, along with a GraphGPT decoder that reconstructs the original graph from graph words to ensure information equivalence. We pretrain GraphsGPT on 100M molecules and yield some interesting findings: (1) Pretrained Graph2Seq excels in graph representation learning, achieving state-of-the-art results on 8/9 graph classification and regression tasks. (2) Pretrained GraphGPT serves as a strong graph generator, demonstrated by its ability to perform both unconditional and conditional graph generation. (3) Graph2Seq+GraphGPT enables effective graph mixup in the Euclidean space, overcoming previously known non-Euclidean challenge. (4) Our proposed novel edge-centric GPT pretraining task is effective in graph fields, underscoring its success in both representation and generation.
arXiv.org Artificial Intelligence
Feb-4-2024
- Genre:
- Research Report (0.40)
- Industry:
- Health & Medicine (0.69)
- Technology:
- Information Technology
- Artificial Intelligence
- Machine Learning > Neural Networks (1.00)
- Natural Language (1.00)
- Representation & Reasoning (1.00)
- Communications (0.93)
- Data Science > Data Mining (0.69)
- Artificial Intelligence
- Information Technology