exphormer
On the Limits of Applying Graph Transformers for Brain Connectome Classification
Lara-Rangel, Jose, Heinbaugh, Clare
However, it did not produce improvements in performance or other notable benefits, see Appendix 1. For the Exphormer, we experimented with different numbers of layers, dropout rates for the network and attention mechanism, and numbers of attention heads. The final configuration used dropout probability of 0.1, attention dropout of 0.3, 2 layers, and 4 attention heads. All experiments used learning rate decay starting at 0.001, decaying by 1e 5, over a total of 100 epochs with 5 warmup epochs. We used three different seeds for both the Exphormer and ResidualGCN and assessed the alignment with the results in Said et al. (2023), which only included one run for each experiment. Apart from evaluating performance, we investigated potential advantages of using attention-based models. Our hypothesis was that the attention mechanism could enhance robustness to data noise, particularly in scenarios where certain graph structure components, such as edges, are missing. To verify that the graph structure, nodes and edges taken together, convey meaningful information for prediction, it is important to compare models under noisy or incomplete data settings. We simulate noisy incomplete data by removing edges based on a pre-specified probability of edge removal.
Diffusion on Graph: Augmentation of Graph Structure for Node Classification
Wang, Yancheng, Liu, Changyu, Yang, Yingzhen
Graph diffusion models have recently been proposed to synthesize entire graphs, such as molecule graphs. Although existing methods have shown great performance in generating entire graphs for graph-level learning tasks, no graph diffusion models have been developed to generate synthetic graph structures, that is, synthetic nodes and associated edges within a given graph, for node-level learning tasks. Inspired by the research in the computer vision literature using synthetic data for enhanced performance, we propose Diffusion on Graph (DoG), which generates synthetic graph structures to boost the performance of GNNs. The synthetic graph structures generated by DoG are combined with the original graph to form an augmented graph for the training of node-level learning tasks, such as node classification and graph contrastive learning (GCL). To improve the efficiency of the generation process, a Bi-Level Neighbor Map Decoder (BLND) is introduced in DoG. To mitigate the adverse effect of the noise introduced by the synthetic graph structures, a low-rank regularization method is proposed for the training of graph neural networks (GNNs) on the augmented graphs. Extensive experiments on various graph datasets for semi-supervised node classification and graph contrastive learning have been conducted to demonstrate the effectiveness of DoG with low-rank regularization.
Even Sparser Graph Transformers
Shirzad, Hamed, Lin, Honghao, Venkatachalam, Balaji, Velingker, Ameya, Woodruff, David, Sutherland, Danica
Graph Transformers excel in long-range dependency modeling, but generally require quadratic memory complexity in the number of nodes in an input graph, and hence have trouble scaling to large graphs. Sparse attention variants such as Exphormer can help, but may require high-degree augmentations to the input graph for good performance, and do not attempt to sparsify an already-dense input graph. As the learned attention mechanisms tend to use few of these edges, such high-degree connections may be unnecessary. We show (empirically and with theoretical backing) that attention scores on graphs are usually quite consistent across network widths, and use this observation to propose a two-stage procedure, which we call Spexphormer: first, train a narrow network on the full augmented graph. Next, use only the active connections to train a wider network on a much sparser graph. We establish theoretical conditions when a narrow network's attention scores can match those of a wide network, and show that Spexphormer achieves good performance with drastically reduced memory requirements on various graph datasets.
UBC, Google & Amii's Exphormer: Scaling Graph Transformers While Slashing Costs
The ability of graph transformers (GT) to model long-range interactions has improved expressivity and made such architectures a promising alternative to traditional graph neural network (GNN) approaches. The GT downside is their poor scalability, which renders them prohibitively expensive when dealing with large and complex graphs. In the new paper Exphormer: Sparse Transformers for Graphs, a research team from the University of British Columbia, Google Research and the Alberta Machine Intelligence Institute proposes Exphormer, a framework that equips sparse GTs with impressive scalability, reduces computational complexity to linear, and achieves state-of-the-art performance on graph benchmarks. The proposed Exphormer is based on and inherits the desirable properties of the recently introduced GraphGPS (Rampasek et al., 2022) modular framework for building general, powerful and scalable GTs with linear complexity. GraphGPS combines traditional local message passing and a global attention mechanism while also allowing for "sparse" attention mechanisms to boost performance and reduce computation costs.