Goto

Collaborating Authors

 Yuan, Baichuan


Learning Graph Quantized Tokenizers for Transformers

arXiv.org Artificial Intelligence

Transformers serve as the backbone architectures of Foundational Models, where a domain-specific tokenizer helps them adapt to various domains. Graph Transformers (GTs) have recently emerged as a leading model in geometric deep learning, outperforming Graph Neural Networks (GNNs) in various graph learning tasks. However, the development of tokenizers for graphs has lagged behind other modalities, with existing approaches relying on heuristics or GNNs co-trained with Transformers. To address this, we introduce GQT (Graph Quantized Tokenizer), which decouples tokenizer training from Transformer training by leveraging multitask graph self-supervised learning, yielding robust and generalizable graph tokens. Furthermore, the GQT utilizes Residual Vector Quantization (RVQ) to learn hierarchical discrete tokens, resulting in significantly reduced memory requirements and improved generalization capabilities. By combining the GQT with token modulation, a Transformer encoder achieves state-of-the-art performance on 16 out of 18 benchmarks, including large-scale homophilic and heterophilic datasets. Unlike message-passing Graph Neural Networks (GNNs), which rely on strong locality inductive biases (Battaglia et al., 2018; Veliฤkoviฤ‡ et al., 2018; Hou et al., 2020; Hamilton et al., 2017a; Kipf & Welling, 2017), GTs are inherently more expressive due to their ability to capture long-range interactions between nodes (Ma et al., 2023). This is particularly beneficial in heterophilous settings where local alignment does not hold (Fu et al., 2024). GTs possess an expressive power at least equivalent to the 2-Weisfeiler-Lehman (WL) isomorphism test (Kim et al., 2022), which is sufficient for most real-world tasks (Zopf, 2022). This surpasses the expressive power of message-passing GNNs, which are limited to the 1-WL test (Ying et al., 2021a).


Do We Really Need Complicated Model Architectures For Temporal Networks?

arXiv.org Artificial Intelligence

Recurrent neural network (RNN) and self-attention mechanism (SAM) are the de facto methods to extract spatial-temporal information for temporal graph learning. Interestingly, we found that although both RNN and SAM could lead to a good performance, in practice neither of them is always necessary. In this paper, we propose GraphMixer, a conceptually and technically simple architecture that consists of three components: 1 a link-encoder that is only based on multi-layer perceptrons (MLP) to summarize the information from temporal links, 2 a node-encoder that is only based on neighbor mean-pooling to summarize node information, and 3 an MLP-based link classifier that performs link prediction based on the outputs of the encoders. These results motivate us to rethink the importance of simpler model architecture. In recent years, temporal graph learning has been recognized as an important machine learning problem and has become the cornerstone behind a wealth of high-impact applications Yu et al. (2018); Bui et al. (2021); Kazemi et al. (2020); Zhou et al. (2020); Cong et al. (2021b). Temporal link prediction is one of the classic downstream tasks which focuses on predicting the future interactions among nodes. For example, in an ads ranking system, the user-ad clicks can be modeled as a temporal bipartite graph whose nodes represent users and ads, and links are associated with timestamps indicating when users click ads. Link prediction between them can be used to predict whether a user will click an ad. Designing graph learning models that can capture node evolutionary patterns and accurately predict future links is a crucial direction for many real-world recommender systems. In temporal graph learning, recurrent neural network (RNN) and self-attention mechanism (SAM) have become the de facto standard for temporal graph learning Kumar et al. (2019); Sankar et al. (2020); Xu et al. (2020); Rossi et al. (2020); Wang et al. (2020), and the majority of the existing works focus on designing neural architectures with one of them and additional components to learn representations from raw data. Although powerful, these methods are conceptually and technically complicated with advanced model architectures.


Multivariate Spatiotemporal Hawkes Processes and Network Reconstruction

arXiv.org Machine Learning

There is often latent network structure in spatial and temporal data and the tools of network analysis can yield fascinating insights into such data. In this paper, we develop a nonparametric method for network reconstruction from spatiotemporal data sets using multivariate Hawkes processes. In contrast to prior work on network reconstruction with point-process models, which has often focused on exclusively temporal information, our approach uses both temporal and spatial information and does not assume a specific parametric form of network dynamics. This leads to an effective way of recovering an underlying network. We illustrate our approach using both synthetic networks and networks constructed from real-world data sets (a location-based social media network, a narrative of crime events, and violent gang crimes). Our results demonstrate that, in comparison to using only temporal data, our spatiotemporal approach yields improved network reconstruction, providing a basis for meaningful subsequent analysis --- such as community structure and motif analysis --- of the reconstructed networks.


Graph-Based Deep Modeling and Real Time Forecasting of Sparse Spatio-Temporal Data

arXiv.org Machine Learning

We present a generic framework for spatio-temporal (ST) data modeling, analysis, and forecasting, with a special focus on data that is sparse in both space and time. Our multi-scaled framework is a seamless coupling of two major components: a self-exciting point process that models the macroscale statistical behaviors of the ST data and a graph structured recurrent neural network (GSRNN) to discover the microscale patterns of the ST data on the inferred graph. This novel deep neural network (DNN) incorporates the real time interactions of the graph nodes to enable more accurate real time forecasting. The effectiveness of our method is demonstrated on both crime and traffic forecasting.