Learning Graph Quantized Tokenizers for Transformers

Wang, Limei, Hassani, Kaveh, Zhang, Si, Fu, Dongqi, Yuan, Baichuan, Cong, Weilin, Hua, Zhigang, Wu, Hao, Yao, Ning, Long, Bo

arXiv.org Artificial Intelligence 

Transformers serve as the backbone architectures of Foundational Models, where a domain-specific tokenizer helps them adapt to various domains. Graph Transformers (GTs) have recently emerged as a leading model in geometric deep learning, outperforming Graph Neural Networks (GNNs) in various graph learning tasks. However, the development of tokenizers for graphs has lagged behind other modalities, with existing approaches relying on heuristics or GNNs co-trained with Transformers. To address this, we introduce GQT (Graph Quantized Tokenizer), which decouples tokenizer training from Transformer training by leveraging multitask graph self-supervised learning, yielding robust and generalizable graph tokens. Furthermore, the GQT utilizes Residual Vector Quantization (RVQ) to learn hierarchical discrete tokens, resulting in significantly reduced memory requirements and improved generalization capabilities. By combining the GQT with token modulation, a Transformer encoder achieves state-of-the-art performance on 16 out of 18 benchmarks, including large-scale homophilic and heterophilic datasets. Unlike message-passing Graph Neural Networks (GNNs), which rely on strong locality inductive biases (Battaglia et al., 2018; Veličković et al., 2018; Hou et al., 2020; Hamilton et al., 2017a; Kipf & Welling, 2017), GTs are inherently more expressive due to their ability to capture long-range interactions between nodes (Ma et al., 2023). This is particularly beneficial in heterophilous settings where local alignment does not hold (Fu et al., 2024). GTs possess an expressive power at least equivalent to the 2-Weisfeiler-Lehman (WL) isomorphism test (Kim et al., 2022), which is sufficient for most real-world tasks (Zopf, 2022). This surpasses the expressive power of message-passing GNNs, which are limited to the 1-WL test (Ying et al., 2021a).