Distinguished In Uniform: Self Attention Vs. Virtual Nodes

Rosenbluth, Eran, Tönshoff, Jan, Ritzert, Martin, Kisin, Berke, Grohe, Martin

arXiv.org Artificial Intelligence 

Graph Transformers (GTs) such as SAN and GPS are graph processing models that combine Message-Passing GNNs (MPGNNs) with global Self-Attention. They were shown to be universal function approximators, with two reservations: 1. The initial node features must be augmented with certain positional encodings. We first clarify that this form of universality is not unique to GTs: Using the same positional encodings, also pure MPGNNs and even 2-layer MLPs are non-uniform universal approximators. We then consider uniform expressivity: The target function is to be approximated by a single network for graphs of all sizes. There, we compare GTs to the more efficient MPGNN + Virtual Node architecture. The essential difference between the two model definitions is in their global computation method - Self-Attention Vs Virtual Node. We prove that none of the models is a uniform-universal approximator, before proving our main result: Neither model's uniform expressivity subsumes the other's. We demonstrate the theory with experiments on synthetic data. We further augment our study with real-world datasets, observing mixed results which indicate no clear ranking in practice as well. In the field of graph learning, message-passing GNNs have long been the undisputed model architecture, even though its basic form is upper bounded in expressivity by the 1-dimensional Weisfeiler-Leman algorithm (Morris et al., 2020; Xu et al., 2019).

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found