Xu, Tingyang
On Self-Distilling Graph Neural Network
Chen, Yuzhao, Bian, Yatao, Xiao, Xi, Rong, Yu, Xu, Tingyang, Huang, Junzhou
Recently, the teacher-student knowledge distillation framework has demonstrated its potential in training Graph Neural Networks (GNNs). However, due to the difficulty of training deep and wide GNN models, one can not always obtain a satisfactory teacher model for distillation. Furthermore, the inefficient training process of teacher-student knowledge distillation also impedes its applications in GNN models. In this paper, we propose the first teacher-free knowledge distillation framework for GNNs, termed GNN Self-Distillation (GNN-SD), that serves as a drop-in replacement for improving the training process of GNNs.We design three knowledge sources for GNN-SD: neighborhood discrepancy rate (NDR), compact graph embedding and intermediate logits. Notably, serving as a metric of the non-smoothness of the embedded graph, NDR empowers the transferability of knowledge that maintains high neighborhood discrepancy by enforcing consistency between consecutive GNN layers. We conduct exploring analysis to verify that our framework could improve the training dynamics and embedding quality of GNNs. Extensive experiments on various popular GNN models and datasets demonstrate that our approach obtains consistent and considerable performance enhancement, proving its effectiveness and generalization ability.
Graph Information Bottleneck for Subgraph Recognition
Yu, Junchi, Xu, Tingyang, Rong, Yu, Bian, Yatao, Huang, Junzhou, He, Ran
Given the input graph and its label/property, several key problems of graph learning, such as finding interpretable subgraphs, graph denoising and graph compression, can be attributed to the fundamental problem of recognizing a subgraph of the original one. This subgraph shall be as informative as possible, yet contains less redundant and noisy structure. This problem setting is closely related to the well-known information bottleneck (IB) principle, which, however, has less been studied for the irregular graph data and graph neural networks (GNNs). In this paper, we propose a framework of Graph Information Bottleneck (GIB) for the subgraph recognition problem in deep graph learning. Under this framework, one can recognize the maximally informative yet compressive subgraph, named IB-subgraph. However, the GIB objective is notoriously hard to optimize, mostly due to the intractability of the mutual information of irregular graph data and the unstable optimization process. In order to tackle these challenges, we propose: i) a GIB objective based-on a mutual information estimator for the irregular graph data; ii) a bi-level optimization scheme to maximize the GIB objective; iii) a connectivity loss to stabilize the optimization process. We evaluate the properties of the IB-subgraph in three application scenarios: improvement of graph classification, graph interpretation and graph denoising. Extensive experiments demonstrate that the information-theoretic IB-subgraph enjoys superior graph properties.
Tackling Over-Smoothing for General Graph Convolutional Networks
Huang, Wenbing, Rong, Yu, Xu, Tingyang, Sun, Fuchun, Huang, Junzhou
Increasing the depth of GCN, which is expected to permit more expressivity, is shown to incur performance detriment especially on node classification. The main cause of this lies in over-smoothing. The over-smoothing issue drives the output of GCN towards a space that contains limited distinguished information among nodes, leading to poor expressivity. Several works on refining the architecture of deep GCN have been proposed, but it is still unknown in theory whether or not these refinements are able to relieve over-smoothing. In this paper, we first theoretically analyze how general GCNs act with the increase in depth, including generic GCN, GCN with bias, ResGCN, and APPNP. We find that all these models are characterized by a universal process: all nodes converging to a cuboid. Upon this theorem, we propose DropEdge to alleviate over-smoothing by randomly removing a certain number of edges at each training epoch. Theoretically, DropEdge either reduces the convergence speed of over-smoothing or relieves the information loss caused by dimension collapse. Experimental evaluations on simulated dataset have visualized the difference in over-smoothing between different GCNs. Moreover, extensive experiments on several real benchmarks support that DropEdge consistently improves the performance on a variety of both shallow and deep GCNs.
Inverse Graph Identification: Can We Identify Node Labels Given Graph Labels?
Bian, Tian, Xiao, Xi, Xu, Tingyang, Rong, Yu, Huang, Wenbing, Zhao, Peilin, Huang, Junzhou
Graph Identification (GI) has long been researched in graph learning and is essential in certain applications (e.g. social community detection). Specifically, GI requires to predict the label/score of a target graph given its collection of node features and edge connections. While this task is common, more complex cases arise in practice---we are supposed to do the inverse thing by, for example, grouping similar users in a social network given the labels of different communities. This triggers an interesting thought: can we identify nodes given the labels of the graphs they belong to? Therefore, this paper defines a novel problem dubbed Inverse Graph Identification (IGI), as opposed to GI. Upon a formal discussion of the variants of IGI, we choose a particular case study of node clustering by making use of the graph labels and node features, with an assistance of a hierarchical graph that further characterizes the connections between different graphs. To address this task, we propose Gaussian Mixture Graph Convolutional Network (GMGCN), a simple yet effective method that makes the node-level message passing process using Graph Attention Network (GAT) under the protocol of GI and then infers the category of each node via a Gaussian Mixture Layer (GML). The training of GMGCN is further boosted by a proposed consensus loss to take advantage of the structure of the hierarchical graph. Extensive experiments are conducted to test the rationality of the formulation of IGI. We verify the superiority of the proposed method compared to other baselines on several benchmarks we have built up. We will release our codes along with the benchmark data to facilitate more research attention to the IGI problem.
Multi-View Graph Neural Networks for Molecular Property Prediction
Ma, Hehuan, Bian, Yatao, Rong, Yu, Huang, Wenbing, Xu, Tingyang, Xie, Weiyang, Ye, Geyan, Huang, Junzhou
The crux of molecular property prediction is to generate meaningful representations of the molecules. One promising route is to exploit the molecular graph structure through Graph Neural Networks (GNNs). It is well known that both atoms and bonds significantly affect the chemical properties of a molecule, so an expressive model shall be able to exploit both node (atom) and edge (bond) information simultaneously. Guided by this observation, we present Multi-View Graph Neural Network (MV-GNN), a multi-view message passing architecture to enable more accurate predictions of molecular properties. In MV-GNN, we introduce a shared self-attentive readout component and disagreement loss to stabilize the training process. This readout component also renders the whole architecture interpretable. We further boost the expressive power of MV-GNN by proposing a cross-dependent message passing scheme that enhances information communication of the two views, which results in the MV-GNN^cross variant. Lastly, we theoretically justify the expressiveness of the two proposed models in terms of distinguishing non-isomorphism graphs. Extensive experiments demonstrate that MV-GNN models achieve remarkably superior performance over the state-of-the-art models on a variety of challenging benchmarks. Meanwhile, visualization results of the node importance are consistent with prior knowledge, which confirms the interpretability power of MV-GNN models.
Re-balancing Variational Autoencoder Loss for Molecule Sequence Generation
Yan, Chaochao, Wang, Sheng, Yang, Jinyu, Xu, Tingyang, Huang, Junzhou
Molecule generation is to design new molecules with specific chemical properties and further to optimize the desired chemical properties. Following previous work, we encode molecules into continuous vectors in the latent space and then decode the vectors into molecules under the variational autoencoder (VAE) framework. We investigate the posterior collapse problem of current RNN-based VAEs for molecule sequence generation. For the first time, we find that underestimated reconstruction loss leads to posterior collapse, and provide both theoretical and experimental evidence. We propose an effective and efficient solution to fix the problem and avoid posterior collapse. Without bells and whistles, our method achieves SOTA reconstruction accuracy and competitive validity on the ZINC 250K dataset. When generating 10,000 unique valid SMILES from random prior sampling, it costs JT-VAE1450s while our method only needs 9s. Our implementation is at https://github.com/chaoyan1037/Re-balanced-VAE.
The General Black-box Attack Method for Graph Neural Networks
Chang, Heng, Rong, Yu, Xu, Tingyang, Huang, Wenbing, Zhang, Honglei, Cui, Peng, Zhu, Wenwu, Huang, Junzhou
With the great success of Graph Neural Networks (GNNs) towards representation learning on graph-structure data, the robustness of GNNs against adversarial attack inevitably becomes a central problem in graph learning domain. Regardless of the fruitful progress, current works suffer from two main limitations: First, the attack method required to be developed case by case; Second, most of them are restricted to the white-box attack. This paper promotes current frameworks in a more general and flexible sense -- we demand only one single method to attack various kinds of GNNs and this attacker is black box driven. To this end, we begin by investigating the theoretical connections between different kinds of GNNs in a principled way and integrate different GNN models into a unified framework, dubbed as General Spectral Graph Convolution. As such, a generalized adversarial attacker is proposed towards two families of GNNs: Convolution-based model and sampling-based model. More interestingly, our attacker does not require any knowledge of the target classifiers used in GNNs. Extensive experimental results validate the effectiveness of our method on several benchmark datasets. Particularly by using our attack, even small graph perturbations like one-edge flip is able to consistently make a strong attack in performance to different GNN models.
The Truly Deep Graph Convolutional Networks for Node Classification
Rong, Yu, Huang, Wenbing, Xu, Tingyang, Huang, Junzhou
Existing Graph Convolutional Networks (GCNs) are shallow---the number of the layers is usually not larger than 2. The deeper variants by simply stacking more layers, unfortunately perform worse, even involving well-known tricks like weight penalizing, dropout, and residual connections. This paper reveals that developing deep GCNs mainly encounters two obstacles: \emph{over-fitting} and \emph{over-smoothing}. The over-fitting issue weakens the generalization ability on small graphs, while over-smoothing impedes model training by isolating output representations from the input features with the increase in network depth. Hence, we propose DropEdge, a novel technique to alleviate both issues. At its core, DropEdge randomly removes a certain number of edges from the input graphs, acting like a data augmenter and also a message passing reducer. More importantly, DropEdge enables us to recast a wider range of Convolutional Neural Networks (CNNs) from the image field to the graph domain; in particular, we study DenseNet and InceptionNet in this paper. Extensive experiments on several benchmarks demonstrate that our method allows deep GCNs to achieve promising performance, even when the number of layers exceeds 30---the deepest GCN that has ever been proposed.
Latent Sparse Modeling of Longitudinal Multi-Dimensional Data
Chen, Ko-Shin (University of Connecticut) | Xu, Tingyang (Tencent Technology Co., Ltd) | Bi, Jinbo (University of Connecticut)
We propose a tensor-based approach to analyze multi-dimensional data describing sample subjects. It simultaneously discovers patterns in features and reveals past temporal points that have impact on current outcomes. The model coefficient, a k-mode tensor, is decomposed into a summation of k tensors of the same dimension. To accomplish feature selection, we introduce the tensor '"atent L F,1 norm" as a grouped penalty in our formulation. Furthermore, the proposed model takes into account within-subject correlations by developing a tensor-based quadratic inference function. We provide an asymptotic analysis of our model when the sample size approaches to infinity. To solve the corresponding optimization problem, we develop a linearized block coordinate descent algorithm and prove its convergence for a fixed sample size. Computational results on synthetic datasets and real-file fMRI and EEG problems demonstrate the superior performance of the proposed approach over existing techniques.