FedGT: Federated Node Classification with Scalable Graph Transformer

Zhang, Zaixi, Hu, Qingyong, Yu, Yang, Gao, Weibo, Liu, Qi

arXiv.org Artificial Intelligence 

Graphs are widely used to model relational data. As graphs are getting larger and larger in real-world scenarios, there is a trend to store and compute subgraphs in multiple local systems. For example, recently proposed subgraph federated learning methods train Graph Neural Networks (GNNs) distributively on local subgraphs and aggregate GNN parameters with a central server. However, existing methods have the following limitations: (1) The links between local subgraphs are missing in subgraph federated learning. This could severely damage the performance of GNNs that follow message-passing paradigms to update node/edge features. To address the aforementioned challenges, we propose a scalable Federated Graph Transformer (FedGT) in the paper. Firstly, we design a hybrid attention scheme to reduce the complexity of the Graph Transformer to linear while ensuring a global receptive field with theoretical bounds. Specifically, each node attends to the sampled local neighbors and a set of curated global nodes to learn both local and global information and be robust to missing links. The global nodes are dynamically updated during training with an online clustering algorithm to capture the data distribution of the corresponding local subgraph. Secondly, FedGT computes clients' similarity based on the aligned global nodes with optimal transport. The similarity is then used to perform weighted averaging for personalized aggregation, which well addresses the data heterogeneity problem. Finally, extensive experimental results on 6 datasets and 2 subgraph settings demonstrate the superiority of FedGT. Many real-world relational data can be represented as graphs, such as social networks (Fan et al., 2019), molecule graphs (Satorras et al., 2021), and commercial trading networks (Xu et al., 2021). Due to the ever-growing size of graph (Hu et al., 2020a) and stricter privacy constraints such as GDPR (Voigt & Von dem Bussche, 2017), it becomes more practical to collect and store sensitive graph data in local systems instead in a central server. For example, banks may have their own relational databases to track commercial relationships between companies and customers. In such scenarios, it is desirable to collaboratively train a powerful and generalizable graph mining model for business, e.g., loan prediction with distributed subgraphs while not sharing private data.