A Graph is Worth $K$ Words: Euclideanizing Graph using Pure Transformer

Gao, Zhangyang, Dong, Daize, Tan, Cheng, Xia, Jun, Hu, Bozhen, Li, Stan Z.

Feb-4-2024–arXiv.org Artificial Intelligence

Can we model non-Euclidean graphs as pure language or even Euclidean vectors while retaining their inherent information? The non-Euclidean property have posed a long term challenge in graph modeling. Despite recent GNN and Graphformer efforts encoding graphs as Euclidean vectors, recovering original graph from the vectors remains a challenge. We introduce GraphsGPT, featuring a Graph2Seq encoder that transforms non-Euclidean graphs into learnable graph words in a Euclidean space, along with a GraphGPT decoder that reconstructs the original graph from graph words to ensure information equivalence. We pretrain GraphsGPT on 100M molecules and yield some interesting findings: (1) Pretrained Graph2Seq excels in graph representation learning, achieving state-of-the-art results on 8/9 graph classification and regression tasks. (2) Pretrained GraphGPT serves as a strong graph generator, demonstrated by its ability to perform both unconditional and conditional graph generation. (3) Graph2Seq+GraphGPT enables effective graph mixup in the Euclidean space, overcoming previously known non-Euclidean challenge. (4) Our proposed novel edge-centric GPT pretraining task is effective in graph fields, underscoring its success in both representation and generation.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

Feb-4-2024

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.40)

Industry:
- Health & Medicine (0.69)

Technology:
- Information Technology
  - Artificial Intelligence
    - Machine Learning > Neural Networks (1.00)
    - Natural Language (1.00)
    - Representation & Reasoning (1.00)
  - Communications (0.93)
  - Data Science > Data Mining (0.69)