AITopics | Feng, Fuli

Collaborating Authors

Feng, Fuli

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

On the Equivalence of Decoupled Graph Convolution Network and Label Propagation

Dong, Hande, Chen, Jiawei, Feng, Fuli, He, Xiangnan, Bi, Shuxian, Ding, Zhaolin, Cui, Peng

arXiv.org Machine LearningOct-23-2020

The original design of Graph Convolution Network (GCN) couples feature transformation and neighborhood aggregation for node representation learning. Recently, some work shows that coupling is inferior to decoupling, which supports deep graph propagation and has become the latest paradigm of GCN (e.g., APPNP and SGCN). Despite effectiveness, the working mechanisms of the decoupled GCN are not well understood. In this paper, we explore the decoupled GCN for semi-supervised node classification from a novel and fundamental perspective -- label propagation. We conduct thorough theoretical analyses, proving that the decoupled GCN is essentially the same as the two-step label propagation: first, propagating the known labels along the graph to generate pseudo-labels for the unlabeled nodes, and second, training normal neural network classifiers on the augmented pseudo-labeled data. More interestingly, we reveal the effectiveness of decoupled GCN: going beyond the conventional label propagation, it could automatically assign structure- and model- aware weights to the pseudo-label data. This explains why the decoupled GCN is relatively robust to the structure noise and over-smoothing, but sensitive to the label noise and model initialization. Based on this insight, we propose a new label propagation method named Propagation then Training Adaptively (PTA), which overcomes the flaws of the decoupled GCN with a dynamic and adaptive weighting strategy. Our PTA is simple yet more effective and robust than decoupled GCN. We empirically validate our findings on four benchmark datasets, demonstrating the advantages of our method.

decoupled gcn, neural network, survey article, (18 more...)

arXiv.org Machine Learning

2010.12408

Country:

North America > United States (0.71)
Asia > China > Anhui Province (0.14)

Genre: Research Report > New Finding (0.87)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.91)

Add feedback

Should Graph Convolution Trust Neighbors? A Simple Causal Inference Method

Feng, Fuli, Huang, Weiran, Xin, Xin, He, Xiangnan, Chua, Tat-Seng

arXiv.org Machine LearningOct-22-2020

Recent studies on Graph Convolutional Networks (GCNs) reveal the usefulness of adaptive locality, which enables adjusting the contribution of a neighbor to the target node representation. Existing work typically achieves adaptive locality by introducing an additional module such as graph attention, which learns to weigh neighbor nodes. However, such module may not work well in practice, since fitting training data well does not necessarily lead to reasonable adaptive locality, especially when the labeled data are small. In an orthogonal direction, this work explores how to achieve adaptive locality in the model inference stage, a new perspective that receives little scrutiny. The main advantage of leaving the training stage unchanged is generality -- it can be applied to most GCNs and improve their inference accuracy. Given a trained GCN model, the idea is to make a counterfactual prediction by blocking the graph structure, i.e., forcing the model to use each node's own features to predict its label. By comparing the real prediction with counterfactual prediction, we can assess the trustworthiness of neighbor nodes. Furthermore, we explore graph uncertainty that measures how the prediction would vary with changes on graph structure, and introduce edge dropout into the inference stage to estimate graph uncertainty. We conduct empirical studies on seven node classification datasets to validate the effectiveness of our methods.

deep learning, neural network, prediction, (17 more...)

arXiv.org Machine Learning

2010.11797

Country:

Asia (0.46)
North America > United States (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

CatGCN: Graph Convolutional Networks with Categorical Node Features

Chen, Weijian, Feng, Fuli, Wang, Qifan, He, Xiangnan, Song, Chonggang, Ling, Guohui, Zhang, Yongdong

arXiv.org Machine LearningSep-17-2020

Recent studies on Graph Convolutional Networks (GCNs) reveal that the initial node representations (i.e., the node representations before the first-time graph convolution) largely affect the final model performance. However, when learning the initial representation for a node, most existing work linearly combines the embeddings of node features, without considering the interactions among the features (or feature embeddings). We argue that when the node features are categorical, e.g., in many real-world applications like user profiling and recommender system, feature interactions usually carry important signals for predictive analytics. Ignoring them will result in suboptimal initial node representation and thus weaken the effectiveness of the follow-up graph convolution. In this paper, we propose a new GCN model named CatGCN, which is tailored for graph learning when the node features are categorical. Specifically, we integrate two ways of explicit interaction modeling into the learning of initial node representation, i.e., local interaction modeling on each pair of node features and global interaction modeling on an artificial feature graph. We then refine the enhanced initial node representations with the neighborhood aggregation-based graph convolution. We train CatGCN in an end-to-end fashion and demonstrate it on semi-supervised node classification. Extensive experiments on three tasks of user profiling (the prediction of user age, city, and purchase level) from Tencent and Alibaba datasets validate the effectiveness of CatGCN, especially the positive effect of performing feature interaction modeling before graph convolution.

information management, interaction, neural network, (19 more...)

arXiv.org Machine Learning

2009.05303

Country: Asia (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
(3 more...)

Add feedback

Data Augmentation View on Graph Convolutional Network and the Proposal of Monte Carlo Graph Learning

Dong, Hande, Ding, Zhaolin, He, Xiangnan, Feng, Fuli, Bi, Shuxian

arXiv.org Machine LearningJun-23-2020

Today, there are two major understandings for graph convolutional networks, i.e., in the spectral and spatial domain. But both lack transparency. In this work, we introduce a new understanding for it -- data augmentation, which is more transparent than the previous understandings. Inspired by it, we propose a new graph learning paradigm -- Monte Carlo Graph Learning (MCGL). The core idea of MCGL contains: (1) Data augmentation: propagate the labels of the training set through the graph structure and expand the training set; (2) Model training: use the expanded training set to train traditional classifiers. We use synthetic datasets to compare the strengths of MCGL and graph convolutional operation on clean graphs. In addition, we show that MCGL's tolerance to graph structure noise is weaker than GCN on noisy graphs (four real-world datasets). Moreover, inspired by MCGL, we re-analyze the reasons why the performance of GCN becomes worse when deepened too much: rather than the mainstream view of over-smoothing, we argue that the main reason is the graph structure noise, and experimentally verify our view. The code is available at https://github.com/DongHande/MCGL.

artificial intelligence, graph, machine learning, (15 more...)

arXiv.org Machine Learning

2006.1309

Country:

Asia > Singapore (0.14)
Asia > China (0.14)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Graph Adversarial Training: Dynamically Regularizing Based on Graph Structure

Feng, Fuli, He, Xiangnan, Tang, Jie, Chua, Tat-Seng

arXiv.org Machine LearningFeb-20-2019

Recent efforts show that neural networks are vulnerable to small but intentional perturbations on input features in visual classification tasks. Due to the additional consideration of connections between examples (e.g., articles with citation link tend to be in the same class), graph neural networks could be more sensitive to the perturbations, since the perturbations from connected examples exacerbate the impact on a target example. Adversarial Training (AT), a dynamic regularization technique, can resist the worst-case perturbations on input features and is a promising choice to improve model robustness and generalization. However, existing AT methods focus on standard classification, being less effective when training models on graph since it does not model the impact from connected examples. In this work, we explore adversarial training on graph, aiming to improve the robustness and generalization of models learned on graph. We propose Graph Adversarial Training (GAT), which takes the impact from connected examples into account when learning to construct and resist perturbations. We give a general formulation of GAT, which can be seen as a dynamic regularization scheme based on the graph structure. To demonstrate the utility of GAT, we employ it on a state-of-the-art graph neural network model --- Graph Convolutional Network (GCN). We conduct experiments on two citation graphs (Citeseer and Cora) and a knowledge graph (NELL), verifying the effectiveness of GAT which outperforms normal training on GCN by 4.51% in node classification accuracy. Codes will be released upon acceptance.

deep learning, neural network, perturbation, (20 more...)

arXiv.org Machine Learning

1902.08226

Country: Asia > China (0.46)

Genre: Research Report (0.64)

Industry: Education (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Discrete Factorization Machines for Fast Feature-based Recommendation

Liu, Han, He, Xiangnan, Feng, Fuli, Nie, Liqiang, Liu, Rui, Zhang, Hanwang

arXiv.org Machine LearningMay-6-2018

User and item features of side information are crucial for accurate recommendation. However, the large number of feature dimensions, e.g., usually larger than 10^7, results in expensive storage and computational cost. This prohibits fast recommendation especially on mobile applications where the computational resource is very limited. In this paper, we develop a generic feature-based recommendation model, called Discrete Factorization Machine (DFM), for fast and accurate recommendation. DFM binarizes the real-valued model parameters (e.g., float32) of every feature embedding into binary codes (e.g., boolean), and thus supports efficient storage and fast user-item score computation. To avoid the severe quantization loss of the binarization, we propose a convergent updating rule that resolves the challenging discrete optimization of DFM. Through extensive experiments on two real-world datasets, we show that 1) DFM consistently outperforms state-of-the-art binarized recommendation models, and 2) DFM shows very competitive performance compared to its real-valued version (FM), demonstrating the minimized quantization loss.

artificial intelligence, optimization problem, recommendation, (18 more...)

arXiv.org Machine Learning

1805.02232

Country: Asia (0.28)

Genre: Research Report (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)

Add feedback