Improving Graph Attention Networks with Large Margin-based Constraints
Wang, Guangtao, Ying, Rex, Huang, Jing, Leskovec, Jure
Graph Attention Networks (GA Ts) are the state-of-the-art neural architecture for representation learning with graphs. GA Ts learn attention functions that assign weights to nodes so that different nodes have different influences in the feature aggregation steps. In practice, however, induced attention functions are prone to over-fitting due to increasing number of parameters and the lack of direct supervision on attention weights. GA Ts also suffer from over-smoothing at the decision boundary of nodes. Here we propose a framework to address their weaknesses via margin-based constraints on attention during training. We first theoretically demonstrate the over-smoothing behavior of GA Ts and then develop an approach using constraint on the attention weights according to the class boundary and feature aggregation pattern. Furthermore, to alleviate the over-fitting problem, we propose additional constraints on graph structure. Extensive experiments and ablation studies on common benchmark datasets demonstrate the effectiveness of our method, which leads to significant improvements over the previous state-of-the-art graph attention methods on all datasets. Introduction Many real world applications involve graph data, like social networks (Zhang and Chen 2018), chemical molecules (Gilmer et al. 2017), and recommender systems (Berg, Kipf, and Welling 2017). The complicated structures of these graphs have inspired new machine learning methods (Cai, Zheng, and Chang 2018; Wu et al. 2019b). Recently much attention and progress has been made on graph neural networks, which have been successfully applied to social network analysis (Battaglia et al. 2016), recommendation systems (Ying et al. 2018), and machine reading comprehension (Tu et al. 2019; De Cao, Aziz, and Titov 2018). Recently, a novel architecture leveraging attention mechanism in Graph Neural Networks (GNNs) called Graph Attention Networks (GA Ts) was introduced (V eli ˇ ckovi c et al. 2017). GA T was motivated by attention mechanism in natural language processing (V aswani et al. 2017; Devlin et al. 2018).
Oct-25-2019
- Country:
- Europe > United Kingdom
- England > Cambridgeshire > Cambridge (0.04)
- North America > United States
- California > Santa Clara County
- Mountain View (0.04)
- Palo Alto (0.04)
- Stanford (0.04)
- California > Santa Clara County
- Europe > United Kingdom
- Genre:
- Research Report > New Finding (0.46)
- Industry:
- Education (0.34)
- Information Technology (0.54)
- Technology: