Zhao, Tianxiang
Enhance GNNs with Reliable Confidence Estimation via Adversarial Calibration Learning
Wang, Yilong, Zhang, Jiahao, Zhao, Tianxiang, Wang, Suhang
Despite their impressive predictive performance, GNNs often exhibit poor confidence calibration, i.e., their predicted confidence scores do not accurately reflect true correctness likelihood. This issue raises concerns about their reliability in high-stakes domains such as fraud detection, and risk assessment, where well-calibrated predictions are essential for decision-making. To ensure trustworthy predictions, several GNN calibration methods are proposed. Though they can improve global calibration, our experiments reveal that they often fail to generalize across different node groups, leading to inaccurate confidence in node groups with different degree levels, classes, and local structures. In certain cases, they even degrade calibration compared to the original uncalibrated GNN. To address this challenge, we propose a novel AdvCali framework that adaptively enhances calibration across different node groups. Our method leverages adversarial training to automatically identify mis-calibrated node groups and applies a differentiable Group Expected Calibration Error (ECE) loss term to refine confidence estimation within these groups. This allows the model to dynamically adjust its calibration strategy without relying on dataset-specific prior knowledge about miscalibrated subgroups. Extensive experiments on real-world datasets demonstrate that our approach not only improves global calibration but also significantly enhances calibration within groups defined by feature similarity, topology, and connectivity, outperforming previous methods and demonstrating its effectiveness in practical scenarios.
Deep Learning within Tabular Data: Foundations, Challenges, Advances and Future Directions
Ren, Weijieying, Zhao, Tianxiang, Huang, Yuqing, Honavar, Vasant
Tabular data remains one of the most prevalent data types across a wide range of real-world applications, yet effective representation learning for this domain poses unique challenges due to its irregular patterns, heterogeneous feature distributions, and complex inter-column dependencies. This survey provides a comprehensive review of state-of-the-art techniques in tabular data representation learning, structured around three foundational design elements: training data, neural architectures, and learning objectives. Unlike prior surveys that focus primarily on either architecture design or learning strategies, we adopt a holistic perspective that emphasizes the universality and robustness of representation learning methods across diverse downstream tasks. We examine recent advances in data augmentation and generation, specialized neural network architectures tailored to tabular data, and innovative learning objectives that enhance representation quality. Additionally, we highlight the growing influence of self-supervised learning and the adaptation of transformer-based foundation models for tabular data. Our review is based on a systematic literature search using rigorous inclusion criteria, encompassing 127 papers published since 2020 in top-tier conferences and journals. Through detailed analysis and comparison, we identify emerging trends, critical gaps, and promising directions for future research, aiming to guide the development of more generalizable and effective tabular data representation methods.
Enhance Graph Alignment for Large Language Models
Luo, Haitong, Meng, Xuying, Wang, Suhang, Zhao, Tianxiang, Wang, Fali, Cao, Hanyun, Zhang, Yujun
Graph-structured data is prevalent in the real world. Recently, due to the powerful emergent capabilities, Large Language Models (LLMs) have shown promising performance in modeling graphs. The key to effectively applying LLMs on graphs is converting graph data into a format LLMs can comprehend. Graph-to-token approaches are popular in enabling LLMs to process graph information. They transform graphs into sequences of tokens and align them with text tokens through instruction tuning, where self-supervised instruction tuning helps LLMs acquire general knowledge about graphs, and supervised fine-tuning specializes LLMs for the downstream tasks on graphs. Despite their initial success, we find that existing methods have a misalignment between self-supervised tasks and supervised downstream tasks, resulting in negative transfer from self-supervised fine-tuning to downstream tasks. To address these issues, we propose Graph Alignment Large Language Models (GALLM) to benefit from aligned task templates. In the self-supervised tuning stage, we introduce a novel text matching task using templates aligned with downstream tasks. In the task-specific tuning stage, we propose two category prompt methods that learn supervision information from additional explanation with further aligned templates. Experimental evaluations on four datasets demonstrate substantial improvements in supervised learning, multi-dataset generalizability, and particularly in zero-shot capability, highlighting the model's potential as a graph foundation model.
Multi-source Unsupervised Domain Adaptation on Graphs with Transferability Modeling
Zhao, Tianxiang, Luo, Dongsheng, Zhang, Xiang, Wang, Suhang
In this paper, we tackle a new problem of \textit{multi-source unsupervised domain adaptation (MSUDA) for graphs}, where models trained on annotated source domains need to be transferred to the unsupervised target graph for node classification. Due to the discrepancy in distribution across domains, the key challenge is how to select good source instances and how to adapt the model. Diverse graph structures further complicate this problem, rendering previous MSUDA approaches less effective. In this work, we present the framework Selective Multi-source Adaptation for Graph ({\method}), with a graph-modeling-based domain selector, a sub-graph node selector, and a bi-level alignment objective for the adaptation. Concretely, to facilitate the identification of informative source data, the similarity across graphs is disentangled and measured with the transferability of a graph-modeling task set, and we use it as evidence for source domain selection. A node selector is further incorporated to capture the variation in transferability of nodes within the same source domain. To learn invariant features for adaptation, we align the target domain to selected source data both at the embedding space by minimizing the optimal transport distance and at the classification level by distilling the label function. Modules are explicitly learned to select informative source data and conduct the alignment in virtual training splits with a meta-learning strategy. Experimental results on five graph datasets show the effectiveness of the proposed method.
Analyzing and Reducing Catastrophic Forgetting in Parameter Efficient Tuning
Ren, Weijieying, Li, Xinlong, Wang, Lei, Zhao, Tianxiang, Qin, Wei
Existing research has shown that large language models (LLMs) exhibit remarkable performance in language understanding and generation. However, when LLMs are continuously fine-tuned on complex and diverse domain-specific downstream tasks, the inference performance on historical tasks decreases dramatically, which is known as a catastrophic forgetting problem. A trade-off needs to be kept between learning plasticity and memory stability. Plenty of existing works have explored strategies like memory replay, regularization and parameter isolation, but little is known about the geometric connection of various adjacent minima in the continual LLMs fine-tuning scenarios. In this work, we investigate the geometric connections of different minima through the lens of mode connectivity, which means different minima can be connected by a low-loss valley. Through extensive experiments, we uncover the mode connectivity phenomenon in the LLMs continual learning scenario and find that it can strike a balance between plasticity and stability. Building upon these findings, we propose a simple yet effective method called Interpolation-based LoRA (I-LoRA), which constructs a dual-memory experience replay framework based on LoRA parameter interpolations. Extensive experiments and analysis on eight domain-specific CL benchmarks demonstrate that I-LoRA consistently show significant improvement over the previous state-of-the-art approaches with up to $11\%$ performance gains, providing a strong baseline and insights for future research on the large language model continual learning problem. Our code is available at \url{https://github.com/which47/LLMCL}.
Disambiguated Node Classification with Graph Neural Networks
Zhao, Tianxiang, Zhang, Xiang, Wang, Suhang
Graph Neural Networks (GNNs) have demonstrated significant success in learning from graph-structured data across various domains. Despite their great successful, one critical challenge is often overlooked by existing works, i.e., the learning of message propagation that can generalize effectively to underrepresented graph regions. These minority regions often exhibit irregular homophily/heterophily patterns and diverse neighborhood class distributions, resulting in ambiguity. In this work, we investigate the ambiguity problem within GNNs, its impact on representation learning, and the development of richer supervision signals to fight against this problem. We conduct a fine-grained evaluation of GNN, analyzing the existence of ambiguity in different graph regions and its relation with node positions. To disambiguate node embeddings, we propose a novel method, {\method}, which exploits additional optimization guidance to enhance representation learning, particularly for nodes in ambiguous regions. {\method} identifies ambiguous nodes based on temporal inconsistency of predictions and introduces a disambiguation regularization by employing contrastive learning in a topology-aware manner. {\method} promotes discriminativity of node representations and can alleviating semantic mixing caused by message propagation, effectively addressing the ambiguity problem. Empirical results validate the efficiency of {\method} and highlight its potential to improve GNN performance in underrepresented graph regions.
Interpretable Imitation Learning with Dynamic Causal Relations
Zhao, Tianxiang, Yu, Wenchao, Wang, Suhang, Wang, Lu, Zhang, Xiang, Chen, Yuncong, Liu, Yanchi, Cheng, Wei, Chen, Haifeng
Imitation learning, which learns agent policy by mimicking expert demonstration, has shown promising results in many applications such as medical treatment regimes and self-driving vehicles. However, it remains a difficult task to interpret control policies learned by the agent. Difficulties mainly come from two aspects: 1) agents in imitation learning are usually implemented as deep neural networks, which are black-box models and lack interpretability; 2) the latent causal mechanism behind agents' decisions may vary along the trajectory, rather than staying static throughout time steps. To increase transparency and offer better interpretability of the neural agent, we propose to expose its captured knowledge in the form of a directed acyclic causal graph, with nodes being action and state variables and edges denoting the causal relations behind predictions. Furthermore, we design this causal discovery process to be state-dependent, enabling it to model the dynamics in latent causal graphs. Concretely, we conduct causal discovery from the perspective of Granger causality and propose a self-explainable imitation learning framework, {\method}. The proposed framework is composed of three parts: a dynamic causal discovery module, a causality encoding module, and a prediction module, and is trained in an end-to-end manner. After the model is learned, we can obtain causal relations among states and action variables behind its decisions, exposing policies learned by it. Experimental results on both synthetic and real-world datasets demonstrate the effectiveness of the proposed {\method} in learning the dynamic causal graphs for understanding the decision-making of imitation learning meanwhile maintaining high prediction accuracy.
Distribution Consistency based Self-Training for Graph Neural Networks with Sparse Labels
Wang, Fali, Zhao, Tianxiang, Wang, Suhang
Few-shot node classification poses a significant challenge for Graph Neural Networks (GNNs) due to insufficient supervision and potential distribution shifts between labeled and unlabeled nodes. Self-training has emerged as a widely popular framework to leverage the abundance of unlabeled data, which expands the training set by assigning pseudo-labels to selected unlabeled nodes. Efforts have been made to develop various selection strategies based on confidence, information gain, etc. However, none of these methods takes into account the distribution shift between the training and testing node sets. The pseudo-labeling step may amplify this shift and even introduce new ones, hindering the effectiveness of self-training. Therefore, in this work, we explore the potential of explicitly bridging the distribution shift between the expanded training set and test set during self-training. To this end, we propose a novel Distribution-Consistent Graph Self-Training (DC-GST) framework to identify pseudo-labeled nodes that are both informative and capable of redeeming the distribution discrepancy and formulate it as a differentiable optimization task. A distribution-shift-aware edge predictor is further adopted to augment the graph and increase the model's generalizability in assigning pseudo labels. We evaluate our proposed method on four publicly available benchmark datasets and extensive experiments demonstrate that our framework consistently outperforms state-of-the-art baselines.
A Comprehensive Survey on Trustworthy Graph Neural Networks: Privacy, Robustness, Fairness, and Explainability
Dai, Enyan, Zhao, Tianxiang, Zhu, Huaisheng, Xu, Junjie, Guo, Zhimeng, Liu, Hui, Tang, Jiliang, Wang, Suhang
Graph Neural Networks (GNNs) have made rapid developments in the recent years. Due to their great ability in modeling graph-structured data, GNNs are vastly used in various applications, including high-stakes scenarios such as financial analysis, traffic predictions, and drug discovery. Despite their great potential in benefiting humans in the real world, recent study shows that GNNs can leak private information, are vulnerable to adversarial attacks, can inherit and magnify societal bias from training data and lack interpretability, which have risk of causing unintentional harm to the users and society. For example, existing works demonstrate that attackers can fool the GNNs to give the outcome they desire with unnoticeable perturbation on training graph. GNNs trained on social networks may embed the discrimination in their decision process, strengthening the undesirable societal bias. Consequently, trustworthy GNNs in various aspects are emerging to prevent the harm from GNN models and increase the users' trust in GNNs. In this paper, we give a comprehensive survey of GNNs in the computational aspects of privacy, robustness, fairness, and explainability. For each aspect, we give the taxonomy of the related methods and formulate the general frameworks for the multiple categories of trustworthy GNNs. We also discuss the future research directions of each aspect and connections between these aspects to help achieve trustworthiness.
T-SaS: Toward Shift-aware Dynamic Adaptation for Streaming Data
Ren, Weijieying, Zhao, Tianxiang, Qin, Wei, Liu, Kunpeng
In many real-world scenarios, distribution shifts exist in the streaming data across time steps. Many complex sequential data can be effectively divided into distinct regimes that exhibit persistent dynamics. Discovering the shifted behaviors and the evolving patterns underlying the streaming data are important to understand the dynamic system. Existing methods typically train one robust model to work for the evolving data of distinct distributions or sequentially adapt the model utilizing explicitly given regime boundaries. However, there are two challenges: (1) shifts in data streams could happen drastically and abruptly without precursors. Boundaries of distribution shifts are usually unavailable, and (2) training a shared model for all domains could fail to capture varying patterns. This paper aims to solve the problem of sequential data modeling in the presence of sudden distribution shifts that occur without any precursors. Specifically, we design a Bayesian framework, dubbed as T-SaS, with a discrete distribution-modeling variable to capture abrupt shifts of data. Then, we design a model that enable adaptation with dynamic network selection conditioned on that discrete variable. The proposed method learns specific model parameters for each distribution by learning which neurons should be activated in the full network. A dynamic masking strategy is adopted here to support inter-distribution transfer through the overlapping of a set of sparse networks. Extensive experiments show that our proposed method is superior in both accurately detecting shift boundaries to get segments of varying distributions and effectively adapting to downstream forecast or classification tasks.