HyPE-GT: where Graph Transformers meet Hyperbolic Positional Encodings

Bose, Kushal, Das, Swagatam

arXiv.org Artificial Intelligence 

Graph Transformers (GTs) facilitate the comprehension of graph-structured data by calculating the self-attention of node pairs without considering node position information. To address this limitation, we introduce an innovative and efficient framework that introduces Positional Encodings (PEs) into the Transformer, generating a set of learnable positional encodings in the hyperbolic space, a non-Euclidean domain. This approach empowers us to explore diverse options for optimal selection of PEs for specific downstream tasks, leveraging hyperbolic neural networks or hyperbolic graph convolutional networks. Additionally, we repurpose these positional encodings to mitigate the impact of over-smoothing in deep Graph Neural Networks (GNNs). Comprehensive experiments on molecular benchmark datasets, co-author, and co-purchase networks substantiate the effectiveness of hyperbolic positional encodings in enhancing the performance of deep GNNs. Transformers (GTs) are earmarked as one of the milestones for modelling the interactions between the node [1] which occurs due to the recursive neighborhood aggregation, over-squashing [2], an information bottleneck caused by the exponential growth of information while increasing the size of the receptive field, and bounded expressive power [3], [4]. Graph Transformers confront the limitations by evaluating pair-wise self-attention without considering the structural inductive bias which causes the loss of positional information of the nodes. As an effective solution, the positional encodings as vectors are integrated with the node features to make the respective nodes topologically aware in the graph. Recently, many efforts were made to generate effective positional encodings for the GTs like spectral decomposition-based learnable encoding [5], structure-aware PEs generated from the rooted subgraph or subtree of the nodes [6], encoding the structural dependency [7], random walk-based learnable positional encodings [8] and many more. The positional encodings derived from existing works suffer from several key limitations like Spectral Attention Network (SAN) which generates learnable positional encodings by spectral decomposition of the Laplacian matrix.