Gan, Xiaoying
OXYGENERATOR: Reconstructing Global Ocean Deoxygenation Over a Century with Deep Learning
Lu, Bin, Zhao, Ze, Han, Luyu, Gan, Xiaoying, Zhou, Yuntao, Zhou, Lei, Fu, Luoyi, Wang, Xinbing, Zhou, Chenghu, Zhang, Jing
Accurately reconstructing the global ocean deoxygenation over a century is crucial for assessing and protecting marine ecosystem. Existing expert-dominated numerical simulations fail to catch up with the dynamic variation caused by global warming and human activities. Besides, due to the high-cost data collection, the historical observations are severely sparse, leading to big challenge for precise reconstruction. In this work, we propose OxyGenerator, the first deep learning based model, to reconstruct the global ocean deoxygenation from 1920 to 2023. Specifically, to address the heterogeneity across large temporal and spatial scales, we propose zoning-varying graph message-passing to capture the complex oceanographic correlations between missing values and sparse observations. Additionally, to further calibrate the uncertainty, we incorporate inductive bias from dissolved oxygen (DO) variations and chemical effects. Compared with in-situ DO observations, OxyGenerator significantly outperforms CMIP6 numerical simulations, reducing MAPE by 38.77%, demonstrating a promising potential to understand the "breathless ocean" in data-driven manner.
AceMap: Knowledge Discovery through Academic Graph
Wang, Xinbing, Fu, Luoyi, Gan, Xiaoying, Wen, Ying, Zheng, Guanjie, Ding, Jiaxin, Xiang, Liyao, Ye, Nanyang, Jin, Meng, Liang, Shiyu, Lu, Bin, Wang, Haiwen, Xu, Yi, Deng, Cheng, Zhang, Shao, Kang, Huquan, Wang, Xingli, Li, Qi, Guo, Zhixin, Qi, Jiexing, Liu, Pan, Ren, Yuyang, Wu, Lyuwen, Yang, Jungang, Zhou, Jianping, Zhou, Chenghu
The exponential growth of scientific literature requires effective management and extraction of valuable insights. While existing scientific search engines excel at delivering search results based on relational databases, they often neglect the analysis of collaborations between scientific entities and the evolution of ideas, as well as the in-depth analysis of content within scientific publications. The representation of heterogeneous graphs and the effective measurement, analysis, and mining of such graphs pose significant challenges. To address these challenges, we present AceMap, an academic system designed for knowledge discovery through academic graph. We present advanced database construction techniques to build the comprehensive AceMap database with large-scale academic entities that contain rich visual, textual, and numerical information. AceMap also employs innovative visualization, quantification, and analysis methods to explore associations and logical relationships among academic entities. AceMap introduces large-scale academic network visualization techniques centered on nebular graphs, providing a comprehensive view of academic networks from multiple perspectives. In addition, AceMap proposes a unified metric based on structural entropy to quantitatively measure the knowledge content of different academic entities. Moreover, AceMap provides advanced analysis capabilities, including tracing the evolution of academic ideas through citation relationships and concept co-occurrence, and generating concise summaries informed by this evolutionary process. In addition, AceMap uses machine reading methods to generate potential new ideas at the intersection of different fields. Exploring the integration of large language models and knowledge graphs is a promising direction for future research in idea evolution. Please visit \url{https://www.acemap.info} for further exploration.
Temporal Generalization Estimation in Evolving Graphs
Lu, Bin, Ma, Tingyan, Gan, Xiaoying, Wang, Xinbing, Zhu, Yunqiang, Zhou, Chenghu, Liang, Shiyu
Graph Neural Networks (GNNs) are widely deployed in vast fields, but they often struggle to maintain accurate representations as graphs evolve. We theoretically establish a lower bound, proving that under mild conditions, representation distortion inevitably occurs over time. To estimate the temporal distortion without human annotation after deployment, one naive approach is to pre-train a recurrent model (e.g., RNN) before deployment and use this model afterwards, but the estimation is far from satisfactory. In this paper, we analyze the representation distortion from an information theory perspective, and attribute it primarily to inaccurate feature extraction during evolution. The ablation studies underscore the necessity of graph reconstruction. For example, on OGB-arXiv dataset, the estimation metric MAPE deteriorates from 2.19% to 8.00% without reconstruction. The rapid rising of Graph Neural Networks (GNN) leads to widely deployment in various applications, e.g. However, recent studies have uncovered a notable challenge: as the distribution of the graph shifts continuously after deployment, GNNs may suffer from the representation distortion over time, which further leads to continuing performance degradation (Liang et al., 2018; Wu et al., 2022; Lu et al., 2023), as shown in Figure 1. This distribution shift may come from the continuous addition of nodes and edges, changes in network structure or the introduction of new features. This issue becomes particularly salient in applications where the graph evolves rapidly over time.
Graph Out-of-Distribution Generalization with Controllable Data Augmentation
Lu, Bin, Gan, Xiaoying, Zhao, Ze, Liang, Shiyu, Fu, Luoyi, Wang, Xinbing, Zhou, Chenghu
Graph Neural Network (GNN) has demonstrated extraordinary performance in classifying graph properties. However, due to the selection bias of training and testing data (e.g., training on small graphs and testing on large graphs, or training on dense graphs and testing on sparse graphs), distribution deviation is widespread. More importantly, we often observe \emph{hybrid structure distribution shift} of both scale and density, despite of one-sided biased data partition. The spurious correlations over hybrid distribution deviation degrade the performance of previous GNN methods and show large instability among different datasets. To alleviate this problem, we propose \texttt{OOD-GMixup} to jointly manipulate the training distribution with \emph{controllable data augmentation} in metric space. Specifically, we first extract the graph rationales to eliminate the spurious correlations due to irrelevant information. Secondly, we generate virtual samples with perturbation on graph rationale representation domain to obtain potential OOD training samples. Finally, we propose OOD calibration to measure the distribution deviation of virtual samples by leveraging Extreme Value Theory, and further actively control the training distribution by emphasizing the impact of virtual OOD samples. Extensive studies on several real-world datasets on graph classification demonstrate the superiority of our proposed method over state-of-the-art baselines.
High-Order Relation Construction and Mining for Graph Matching
Xu, Hui, Xiang, Liyao, Le, Youmin, Gan, Xiaoying, Jia, Yuting, Fu, Luoyi, Wang, Xinbing
Graph matching pairs corresponding nodes across two or more graphs. The problem is difficult as it is hard to capture the structural similarity across graphs, especially on large graphs. We propose to incorporate high-order information for matching large-scale graphs. Iterated line graphs are introduced for the first time to describe such high-order information, based on which we present a new graph matching method, called High-order Graph Matching Network (HGMN), to learn not only the local structural correspondence, but also the hyperedge relations across graphs. We theoretically prove that iterated line graphs are more expressive than graph convolution networks in terms of aligning nodes. By imposing practical constraints, HGMN is made scalable to large-scale graphs. Experimental results on a variety of settings have shown that, HGMN acquires more accurate matching results than the state-of-the-art, verifying our method effectively captures the structural similarity across different graphs.