Relational Attention: Generalizing Transformers for Graph-Structured Tasks

Diao, Cameron, Loynd, Ricky

arXiv.org Artificial Intelligence 

Transformers flexibly operate over sets of real-valued vectors representing taskspecific entities and their attributes, where each vector might encode one wordpiece token and its position in a sequence, or some piece of information that carries no position at all. But as set processors, standard transformers are at a disadvantage in reasoning over more general graph-structured data where nodes represent entities and edges represent relations between entities. To address this shortcoming, we generalize transformer attention to consider and update edge vectors in each transformer layer. We evaluate this relational transformer on a diverse array of graph-structured tasks, including the large and challenging CLRS Algorithmic Reasoning Benchmark. There, it dramatically outperforms state-of-theart graph neural networks expressly designed to reason over graph-structured data. Our analysis demonstrates that these gains are attributable to relational attention's inherent ability to leverage the greater expressivity of graphs over sets. Graph-structured problems turn up in many domains, including knowledge bases (Hu et al., 2021; Bordes et al., 2013), communication networks (Leskovec et al., 2010), citation networks (McCallum et al., 2000), and molecules (Debnath et al., 1991; Zhang et al., 2020b). One example is predicting the bioactive properties of a molecule, where the atoms of the molecule are the nodes of the graph and the bonds are the edges. Along with their ubiquity, graph-structured problems vary widely in difficulty. For example, certain graph problems can be solved with a simple multi-layer perceptron, while others are quite challenging and require explicit modeling of relational characteristics. Graph Neural Networks (GNNs) are designed to process graphstructured data, including the graph's (possibly directed) edge Figure 1: The relational transformer structure and (in some cases) features associated with the edges. Standard transformers lack the relational inductive biases (Battaglia et al., 2018) that are explicitly built into the most commonly used GNNs. This allows entities carrying domain-specific attributes (like position) to be encoded as vectors for input to the same transformer architecture applied to different domains. Work was done during an internship at Microsoft Research. Figure 2: Categories of GNNs and Transformers, compared in terms of transformer machinery and edge vector incorporation.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found