Goto

Collaborating Authors

 unlabeled graph



Data-Centric Learning from Unlabeled Graphs with Diffusion Model

Neural Information Processing Systems

Graph property prediction tasks are important and numerous. While each task offers a small size of labeled examples, unlabeled graphs have been collected from various sources and at a large scale. A conventional approach is training a model with the unlabeled graphs on self-supervised tasks and then fine-tuning the model on the prediction tasks. However, the self-supervised task knowledge could not be aligned or sometimes conflicted with what the predictions needed. In this paper, we propose to extract the knowledge underlying the large set of unlabeled graphs as a specific set of useful data points to augment each property prediction model. We use a diffusion model to fully utilize the unlabeled graphs and design two new objectives to guide the model's denoising process with each task's labeled data to generate task-specific graph examples and their labels. Experiments demonstrate that our data-centric approach performs significantly better than fifteen existing various methods on fifteen tasks. The performance improvement brought by unlabeled data is visible as the generated labeled examples unlike the self-supervised learning.



Data-Centric Learning from Unlabeled Graphs with Diffusion Model

Neural Information Processing Systems

Graph property prediction tasks are important and numerous. While each task offers a small size of labeled examples, unlabeled graphs have been collected from various sources and at a large scale. A conventional approach is training a model with the unlabeled graphs on self-supervised tasks and then fine-tuning the model on the prediction tasks. However, the self-supervised task knowledge could not be aligned or sometimes conflicted with what the predictions needed. In this paper, we propose to extract the knowledge underlying the large set of unlabeled graphs as a specific set of useful data points to augment each property prediction model. We use a diffusion model to fully utilize the unlabeled graphs and design two new objectives to guide the model's denoising process with each task's labeled data to generate task-specific graph examples and their labels.


Hypergraph-enhanced Dual Semi-supervised Graph Classification

Ju, Wei, Mao, Zhengyang, Yi, Siyu, Qin, Yifang, Gu, Yiyang, Xiao, Zhiping, Wang, Yifan, Luo, Xiao, Zhang, Ming

arXiv.org Artificial Intelligence

In this paper, we study semi-supervised graph classification, which aims at accurately predicting the categories of graphs in scenarios with limited labeled graphs and abundant unlabeled graphs. Despite the promising capability of graph neural networks (GNNs), they typically require a large number of costly labeled graphs, while a wealth of unlabeled graphs fail to be effectively utilized. Moreover, GNNs are inherently limited to encoding local neighborhood information using message-passing mechanisms, thus lacking the ability to model higher-order dependencies among nodes. To tackle these challenges, we propose a Hypergraph-Enhanced DuAL framework named HEAL for semi-supervised graph classification, which captures graph semantics from the perspective of the hypergraph and the line graph, respectively. Specifically, to better explore the higher-order relationships among nodes, we design a hypergraph structure learning to adaptively learn complex node dependencies beyond pairwise relations. Meanwhile, based on the learned hypergraph, we introduce a line graph to capture the interaction between hyperedges, thereby better mining the underlying semantic structures. Finally, we develop a relational consistency learning to facilitate knowledge transfer between the two branches and provide better mutual guidance. Extensive experiments on real-world graph datasets verify the effectiveness of the proposed method against existing state-of-the-art methods.


Data-Centric Learning from Unlabeled Graphs with Diffusion Model

Liu, Gang, Inae, Eric, Zhao, Tong, Xu, Jiaxin, Luo, Tengfei, Jiang, Meng

arXiv.org Artificial Intelligence

Graph property prediction tasks are important and numerous. While each task offers a small size of labeled examples, unlabeled graphs have been collected from various sources and at a large scale. A conventional approach is training a model with the unlabeled graphs on self-supervised tasks and then fine-tuning the model on the prediction tasks. However, the self-supervised task knowledge could not be aligned or sometimes conflicted with what the predictions needed. In this paper, we propose to extract the knowledge underlying the large set of unlabeled graphs as a specific set of useful data points to augment each property prediction model. We use a diffusion model to fully utilize the unlabeled graphs and design two new objectives to guide the model's denoising process with each task's labeled data to generate task-specific graph examples and their labels. Experiments demonstrate that our data-centric approach performs significantly better than fifteen existing various methods on fifteen tasks. The performance improvement brought by unlabeled data is visible as the generated labeled examples unlike the self-supervised learning.


Semi-Supervised Graph Imbalanced Regression

Liu, Gang, Zhao, Tong, Inae, Eric, Luo, Tengfei, Jiang, Meng

arXiv.org Artificial Intelligence

Data imbalance is easily found in annotated data when the observations of certain continuous label values are difficult to collect for regression tasks. When they come to molecule and polymer property predictions, the annotated graph datasets are often small because labeling them requires expensive equipment and effort. To address the lack of examples of rare label values in graph regression tasks, we propose a semi-supervised framework to progressively balance training data and reduce model bias via self-training. The training data balance is achieved by (1) pseudo-labeling more graphs for under-represented labels with a novel regression confidence measurement and (2) augmenting graph examples in latent space for remaining rare labels after data balancing with pseudo-labels. The former is to identify quality examples from unlabeled data whose labels are confidently predicted and sample a subset of them with a reverse distribution from the imbalanced annotated data. The latter collaborates with the former to target a perfect balance using a novel label-anchored mixup algorithm. We perform experiments in seven regression tasks on graph datasets. Results demonstrate that the proposed framework significantly reduces the error of predicted graph properties, especially in under-represented label areas.


Dynamic Labeling for Unlabeled Graph Neural Networks

Sun, Zeyu, Zhang, Wenjie, Mou, Lili, Zhu, Qihao, Xiong, Yingfei, Zhang, Lu

arXiv.org Artificial Intelligence

Existing graph neural networks (GNNs) largely rely on node embeddings, which represent a node as a vector by its identity, type, or content. However, graphs with unlabeled nodes widely exist in real-world applications (e.g., anonymized social networks). Previous GNNs either assign random labels to nodes (which introduces artefacts to the GNN) or assign one embedding to all nodes (which fails to distinguish one node from another). In this paper, we analyze the limitation of existing approaches in two types of classification tasks, graph classification and node classification. Inspired by our analysis, we propose two techniques, Dynamic Labeling and Preferential Dynamic Labeling, that satisfy desired properties statistically or asymptotically for each type of the task. Experimental results show that we achieve high performance in various graph-related tasks.


Matching Node Embeddings for Graph Similarity

Nikolentzos, Giannis (Ecole Polytechnique and Athens University of Economics and Business) | Meladianos, Polykarpos (Ecole Polytechnique and Athens University of Economics and Business) | Vazirgiannis, Michalis (Ecole Polytechnique and Athens University of Economics and Business)

AAAI Conferences

Graph kernels have emerged as a powerful tool for graph comparison. Most existing graph kernels focus on local properties of graphs and ignore global structure. In this paper, we compare graphs based on their global properties as these are captured by the eigenvectors of their adjacency matrices. We present two algorithms for both labeled and unlabeled graph comparison. These algorithms represent each graph as a set of vectors corresponding to the embeddings of its vertices. The similarity between two graphs is then determined using the Earth Mover's Distance metric. These similarities do not yield a positive semidefinite matrix. To address for this, we employ an algorithm for SVM classification using indefinite kernels. We also present a graph kernel based on the Pyramid Match kernel that finds an approximate correspondence between the sets of vectors of the two graphs. We further improve the proposed kernel using the Weisfeiler-Lehman framework. We evaluate the proposed methods on several benchmark datasets for graph classification and compare their performance to state-of-the-art graph kernels. In most cases, the proposed algorithms outperform the competing methods, while their time complexity remains very attractive.


Enumerating Markov Equivalence Classes of Acyclic Digraph Models

Gillispie, Steven B., Perlman, Michael D.

arXiv.org Artificial Intelligence

Graphical Markov models determined by acyclic digraphs (ADGs), also called directed acyclic graphs (DAGs), are widely studied in statistics, computer science (as Bayesian networks), operations research (as influence diagrams), and many related fields. Because different ADGs may determine the same Markov equivalence class, it long has been of interest to determine the efficiency gained in model specification and search by working directly with Markov equivalence classes of ADGs rather than with ADGs themselves. A computer program was written to enumerate the equivalence classes of ADG models as specified by Pearl & Verma's equivalence criterion. The program counted equivalence classes for models up to and including 10 vertices. The ratio of number of classes to ADGs appears to approach an asymptote of about 0.267. Classes were analyzed according to number of edges and class size. By edges, the distribution of number of classes approaches a Gaussian shape. By class size, classes of size 1 are most common, with the proportions for larger sizes initially decreasing but then following a more irregular pattern. The maximum number of classes generated by any undirected graph was found to increase approximately factorially. The program also includes a new variation of orderly algorithm for generating undirected graphs.