edge transformer
- North America > Canada > Ontario > Toronto (0.14)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Europe > Germany (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- (2 more...)
Systematic Generalization with Edge Transformers
Recent research suggests that systematic generalization in natural language understanding remains a challenge for state-of-the-art neural models such as Transformers and Graph Neural Networks. To tackle this challenge, we propose Edge Transformer, a new model that combines inspiration from Transformers and rule-based symbolic AI. The first key idea in Edge Transformers is to associate vector states with every edge, that is, with every pair of input nodes---as opposed to just every node, as it is done in the Transformer model. The second major innovation is a triangular attention mechanism that updates edge representations in a way that is inspired by unification from logic programming. We evaluate Edge Transformer on compositional generalization benchmarks in relational reasoning, semantic parsing, and dependency parsing. In all three settings, the Edge Transformer outperforms Relation-aware, Universal and classical Transformer baselines.
- North America > Canada > Ontario > Toronto (0.14)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Europe > Germany (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (0.68)
A Additional Experiment Details
The experiments were performed using a cluster of 12 GPUs (2 24GB, 2 12GB, 8 11GB). For Transformer models, the number of layers varied from 5 to 8. The number of heads was fixed to 8. No hyperparameter search was performed for Edge Transformer on COGS. Architecture hyperparam-eters for Edge Transformer were matched to those of (Ontanón et al., 2021), who tuned the number We therefore use three layers for Edge Transformer. Default settings were used for optimizer hyperparameters. Do the main claims made in the abstract and introduction accurately reflect the paper's If you ran experiments... (a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Y es] (b) Did you specify all the training details (e.g., data splits, hyperparameters, how they Did you report error bars (e.g., with respect to the random seed after running experiments multiple times)?
Systematic Generalization with Edge Transformers
Recent research suggests that systematic generalization in natural language understanding remains a challenge for state-of-the-art neural models such as Transformers and Graph Neural Networks. To tackle this challenge, we propose Edge Transformer, a new model that combines inspiration from Transformers and rule-based symbolic AI. The first key idea in Edge Transformers is to associate vector states with every edge, that is, with every pair of input nodes---as opposed to just every node, as it is done in the Transformer model. The second major innovation is a triangular attention mechanism that updates edge representations in a way that is inspired by unification from logic programming. We evaluate Edge Transformer on compositional generalization benchmarks in relational reasoning, semantic parsing, and dependency parsing.
Towards Principled Graph Transformers
Müller, Luis, Kusuma, Daniel, Morris, Christopher
Graph Neural Networks (GNNs) are the de-facto standard in graph learning [Kipf and Welling, 2017, Gilmer et al., 2017, Scarselli et al., 2009, Xu et al., 2019] but suffer from limited expressivity in distinguishing non-isomorphic graphs in terms of the 1-dimensional Weisfeiler-Leman algorithm (1-WL) [Morris et al., 2019, Xu et al., 2019]. Hence, recent works introduced higher-order GNNs, aligned with the k-dimensional Weisfeiler-Leman (k-WL) hierarchy for graph isomorphism testing [Azizian and Lelarge, 2021, Morris et al., 2019, 2020, 2022], resulting in more expressivity with an increase in k > 1. The k-WL hierarchy draws from a rich history in graph theory [Babai, 1979, Babai and Kucera, 1979, Babai et al., 1980, Cai et al., 1992, Weisfeiler and Leman, 1968], offering a deep theoretical understanding of k-WL-aligned GNNs. While theoretically intriguing, higher-order GNNs often fail to deliver state-of-the-art performance on real-world problems, making theoretically grounded models less relevant in practice [Azizian and Lelarge, 2021, Morris et al., 2020, 2022]. In contrast, graph transformers [Glickman and Yahav, 2023, He et al., 2023, Ma et al., 2023, Rampášek et al., 2022, Ying et al., 2021] recently demonstrated state-of-the-art empirical performance. However, they draw their expressive power mostly from positional/structural encodings (PEs), making it difficult to understand these models in terms of an expressivity hierarchy such as the k-WL. While a few works theoretically aligned graph transformers with the k-WL hierarchy [Kim et al., 2021, 2022, Zhang et al., 2023], we are not aware of any works
- North America > Canada > Ontario > Toronto (0.14)
- North America > United States > Wisconsin (0.04)
- North America > United States > Texas (0.04)
- Europe > Germany (0.04)
Friend Ranking in Online Games via Pre-training Edge Transformers
Yao, Liang, Peng, Jiazhen, Ji, Shenggong, Liu, Qiang, Cai, Hongyun, He, Feng, Cheng, Xu
Friend recall is an important way to improve Daily Active Users (DAU) in online games. The problem is to generate a proper lost friend ranking list essentially. Traditional friend recall methods focus on rules like friend intimacy or training a classifier for predicting lost players' return probability, but ignore feature information of (active) players and historical friend recall events. In this work, we treat friend recall as a link prediction problem and explore several link prediction methods which can use features of both active and lost players, as well as historical events. Furthermore, we propose a novel Edge Transformer model and pre-train the model via masked auto-encoders. Our method achieves state-of-the-art results in the offline experiments and online A/B Tests of three Tencent games.
- Asia > Taiwan > Taiwan Province > Taipei (0.05)
- North America > United States > New York > New York County > New York City (0.04)
- Asia > China > Guangdong Province > Shenzhen (0.04)