Collaborating Authors


Introduction to GraphSAGE in Python


GraphSAGE is a Graph Neural Network designed to manage billions of nodes and edges thanks to neighbor sampling and a fast aggregator.

SUGAR: Efficient Subgraph-level Training via Resource-aware Graph Partitioning Artificial Intelligence

Graph Neural Networks (GNNs) have demonstrated a great potential in a variety of graph-based applications, such as recommender systems, drug discovery, and object recognition. Nevertheless, resource-efficient GNN learning is a rarely explored topic despite its many benefits for edge computing and Internet of Things (IoT) applications. To improve this state of affairs, this work proposes efficient subgraph-level training via resource-aware graph partitioning (SUGAR). SUGAR first partitions the initial graph into a set of disjoint subgraphs and then performs local training at the subgraph-level. We provide a theoretical analysis and conduct extensive experiments on five graph benchmarks to verify its efficacy in practice. Our results show that SUGAR can achieve up to 33 times runtime speedup and 3.8 times memory reduction on large-scale graphs. We believe SUGAR opens a new research direction towards developing GNN methods that are resource-efficient, hence suitable for IoT deployment.

GraphDCA -- a Framework for Node Distribution Comparison in Real and Synthetic Graphs Artificial Intelligence

We argue that when comparing two graphs, the distribution of node structural features is more informative than global graph statistics which are often used in practice, especially to evaluate graph generative models. Thus, we present GraphDCA - a framework for evaluating similarity between graphs based on the alignment of their respective node representation sets. The sets are compared using a recently proposed method for comparing representation spaces, called Delaunay Component Analysis (DCA), which we extend to graph data. To evaluate our framework, we generate a benchmark dataset of graphs exhibiting different structural patterns and show, using three node structure feature extractors, that GraphDCA recognizes graphs with both similar and dissimilar local structure. We then apply our framework to evaluate three publicly available real-world graph datasets and demonstrate, using gradual edge perturbations, that GraphDCA satisfyingly captures gradually decreasing similarity, unlike global statistics. Finally, we use GraphDCA to evaluate two state-of-the-art graph generative models, NetGAN and CELL, and conclude that further improvements are needed for these models to adequately reproduce local structural features.

Survey on Graph Neural Network Acceleration: An Algorithmic Perspective Artificial Intelligence

First, explosive increase of graph data poses a great challenge to GNN training on large-scale datasets. Previously, Graph neural networks (GNNs) have been a hot many graph-based tasks were often conducted on toy datasets spot of recent research and are widely utilized in diverse that are relatively small compared to graphs in realistic applications, applications. However, with the use of huger which is harmful to model scalability and practical data and deeper models, an urgent demand is unsurprisingly usages. Currently, large-scale graph datasets are thereby proposed made to accelerate GNNs for more efficient in literature [Hu et al., 2020a] for advanced research, execution. In this paper, we provide a comprehensive and at the same time, making GNNs execution (i.e., training survey on acceleration methods for GNNs and inference) a time-consuming process.


AAAI Conferences

Influence maximization plays a key role in social network viral marketing. Although the problem has been widely studied, it is still challenging to estimate influence spread in big networks with hundreds of millions of nodes. Existing heuristic algorithms and greedy algorithms incur heavy computation cost in big networks and are incapable of processing dynamic network structures. In this paper, we propose an incremental algorithm for influence spread estimation in big networks. The incremental algorithm breaks down big networks into small subgraphs ad continuously estimate influence spread on these subgraphs as data streams. The challenge of the incremental algorithm is that subgraphs derived from a big network are not independent and MC simulations on each subgraph (defined as snapshots) may conflict with each other. In this paper, we assume that different combinations of MC simulations on subgraphs on subgraphs generate independent samples. In so doing, the incremental algorithm on streaming subgraphs can estimate influence spread with fewer simulations. Experimental results demonstrates the performance of the proposed algorithm.


AAAI Conferences

A Knowledge Graph (KG), popularly used in both industry and academia, is an effective representation of knowledge. It consists of a collection of knowledge elements, each of which in turn is extracted from the web or other sources. Information extractors that use natural language processing techniques or other complex algorithms are usually noisy. That is, the vast number of knowledge elements extracted from the web may not only be associated with different confidence values but may also be inconsistent with each other. Many applications such as question answering systems that are built on top of large-scale KGs are required to reason efficiently about these confidence values and inconsistencies.

Link Prediction with Contextualized Self-Supervision Artificial Intelligence

Link prediction aims to infer the existence of a link between two nodes in a network. Despite their wide application, the success of traditional link prediction algorithms is hindered by three major challenges -- link sparsity, node attribute noise and network dynamics -- that are faced by real-world networks. To overcome these challenges, we propose a Contextualized Self-Supervised Learning (CSSL) framework that fully exploits structural context prediction for link prediction. The proposed CSSL framework forms edge embeddings through aggregating pairs of node embeddings constructed via a transformation on node attributes, which are used to predict the link existence probability. To generate node embeddings tailored for link prediction, structural context prediction is leveraged as a self-supervised learning task to boost link prediction. Two types of structural contexts are investigated, i.e., context nodes collected from random walks vs. context subgraphs. The CSSL framework can be trained in an end-to-end manner, with the learning of node and edge embeddings supervised by link prediction and the self-supervised learning task. The proposed CSSL is a generic and flexible framework in the sense that it can handle both transductive and inductive link prediction settings, and both attributed and non-attributed networks. Extensive experiments and ablation studies on seven real-world benchmark graph datasets demonstrate the superior performance of the proposed self-supervision based link prediction algorithm over state-of-the-art baselines on different types of networks under both transductive and inductive settings. The proposed CSSL also yields competitive performance in terms of its robustness to node attribute noise and scalability over large-scale networks.

Reinforcement Learning Based Query Vertex Ordering Model for Subgraph Matching Artificial Intelligence

Subgraph matching is a fundamental problem in various fields that use graph structured data. Subgraph matching algorithms enumerate all isomorphic embeddings of a query graph q in a data graph G. An important branch of matching algorithms exploit the backtracking search approach which recursively extends intermediate results following a matching order of query vertices. It has been shown that the matching order plays a critical role in time efficiency of these backtracking based subgraph matching algorithms. In recent years, many advanced techniques for query vertex ordering (i.e., matching order generation) have been proposed to reduce the unpromising intermediate results according to the preset heuristic rules. In this paper, for the first time we apply the Reinforcement Learning (RL) and Graph Neural Networks (GNNs) techniques to generate the high-quality matching order for subgraph matching algorithms. Instead of using the fixed heuristics to generate the matching order, our model could capture and make full use of the graph information, and thus determine the query vertex order with the adaptive learning-based rule that could significantly reduces the number of redundant enumerations. With the help of the reinforcement learning framework, our model is able to consider the long-term benefits rather than only consider the local information at current ordering step.Extensive experiments on six real-life data graphs demonstrate that our proposed matching order generation technique could reduce up to two orders of magnitude of query processing time compared to the state-of-the-art algorithms.

Informative Pseudo-Labeling for Graph Neural Networks with Few Labels Artificial Intelligence

Graph Neural Networks (GNNs) have achieved state-of-the-art results for semi-supervised node classification on graphs. Nevertheless, the challenge of how to effectively learn GNNs with very few labels is still under-explored. As one of the prevalent semi-supervised methods, pseudo-labeling has been proposed to explicitly address the label scarcity problem. It aims to augment the training set with pseudo-labeled unlabeled nodes with high confidence so as to re-train a supervised model in a self-training cycle. However, the existing pseudo-labeling approaches often suffer from two major drawbacks. First, they tend to conservatively expand the label set by selecting only high-confidence unlabeled nodes without assessing their informativeness. Unfortunately, those high-confidence nodes often convey overlapping information with given labels, leading to minor improvements for model re-training. Second, these methods incorporate pseudo-labels to the same loss function with genuine labels, ignoring their distinct contributions to the classification task. In this paper, we propose a novel informative pseudo-labeling framework, called InfoGNN, to facilitate learning of GNNs with extremely few labels. Our key idea is to pseudo label the most informative nodes that can maximally represent the local neighborhoods via mutual information maximization. To mitigate the potential label noise and class-imbalance problem arising from pseudo labeling, we also carefully devise a generalized cross entropy loss with a class-balanced regularization to incorporate generated pseudo labels into model re-training. Extensive experiments on six real-world graph datasets demonstrate that our proposed approach significantly outperforms state-of-the-art baselines and strong self-supervised methods on graphs.

Decoupling the Depth and Scope of Graph Neural Networks Artificial Intelligence

State-of-the-art Graph Neural Networks (GNNs) have limited scalability with respect to the graph and model sizes. On large graphs, increasing the model depth often means exponential expansion of the scope (i.e., receptive field). Beyond just a few layers, two fundamental challenges emerge: 1. degraded expressivity due to oversmoothing, and 2. expensive computation due to neighborhood explosion. We propose a design principle to decouple the depth and scope of GNNs -- to generate representation of a target entity (i.e., a node or an edge), we first extract a localized subgraph as the bounded-size scope, and then apply a GNN of arbitrary depth on top of the subgraph. A properly extracted subgraph consists of a small number of critical neighbors, while excluding irrelevant ones. The GNN, no matter how deep it is, smooths the local neighborhood into informative representation rather than oversmoothing the global graph into "white noise". Theoretically, decoupling improves the GNN expressive power from the perspectives of graph signal processing (GCN), function approximation (GraphSAGE) and topological learning (GIN). Empirically, on seven graphs (with up to 110M nodes) and six backbone GNN architectures, our design achieves significant accuracy improvement with orders of magnitude reduction in computation and hardware cost.