Scalable Message Passing Neural Networks: No Need for Attention in Large Graph Representation Learning

Borde, Haitz Sáez de Ocáriz, Lukoianov, Artem, Kratsios, Anastasis, Bronstein, Michael, Dong, Xiaowen

arXiv.org Artificial Intelligence 

Traditionally, Graph Neural Networks (GNNs) [1] have primarily been applied to model functions over graphs with a relatively modest number of nodes. However, recently there has been a growing interest in exploring the application of GNNs to large-scale graph benchmarks, including datasets with up to a hundred million nodes [2]. This exploration could potentially lead to better models for industrial applications such as large-scale network analysis in social media, where there are typically millions of users, or in biology, where proteins and other macromolecules are composed of a large number of atoms. This presents a significant challenge in designing GNNs that are scalable while retaining their effectiveness. To this end, we take inspiration from the literature on Large Language Models (LLMs) and propose a simple modification to how GNN architectures are typically arranged. Our framework, Scalable Message Passing Neural Networks (SMPNNs), enables the construction of deep and scalable architectures that outperform the current state-of-the-art models for large graph benchmarks in transductive classification. More specifically, we find that following the typical construction of the Pre-Layer Normalization (Pre-LN) Transformer formulation [3] and replacing attention with standard message-passing convolution is enough to outperform the best Graph Transformers in the literature. Moreover, since our formulation does not necessarily require attention, our architecture scales better than Graph Transformers.