Zheng, Zaiyi
Beyond the Permutation Symmetry of Transformers: The Role of Rotation for Model Fusion
Zhang, Binchi, Zheng, Zaiyi, Chen, Zhengzhang, Li, Jundong
For instance, in a two-layer MLP, permuting the rows of the weight matrix in the first Symmetry in the parameter space of deep neural layer and applying the corresponding inverse permutation to networks (DNNs) has proven beneficial for various the second layer results in a functionally equivalent model, deep learning applications. A well-known i.e., the outputs of the original and permuted models remain example is the permutation symmetry in Multi-identical for any given input (Ainsworth et al., 2023). Layer Perceptrons (MLPs), where permuting the All functionally equivalent models corresponding to weight rows of weight matrices in one layer and applying permutations form an equivalence set, which provides theoretical the inverse permutation to adjacent layers yields a insights into neural network optimization, such as functionally equivalent model. While permutation the linear mode connectivity of loss landscapes (Entezari symmetry fully characterizes the equivalence set et al., 2022; Zhou et al., 2023). In addition, permutation for MLPs, its discrete nature limits its utility for symmetry has also proven helpful in advancing neural network transformers. In this paper, we introduce rotation applications, such as model fusion (Singh & Jaggi, symmetry, a novel form of parameter space symmetry 2020; Ainsworth et al., 2023) and optimization (Zhao et al., for transformers that generalizes permutation 2024).
Resolving Editing-Unlearning Conflicts: A Knowledge Codebook Framework for Large Language Model Updating
Zhang, Binchi, Chen, Zhengzhang, Zheng, Zaiyi, Li, Jundong, Chen, Haifeng
Large Language Models (LLMs) excel in natural language processing by encoding extensive human knowledge, but their utility relies on timely updates as knowledge evolves. Updating LLMs involves two key tasks simultaneously: unlearning to remove unwanted knowledge and editing to incorporate new information. Existing methods face two major challenges: ineffective knowledge storage (either too sparse or too dense) and task conflicts between editing and unlearning, as validated through our theoretical and experimental results. To address these issues, we propose LOKA, a conflict-free framework for LLM updating based on a knowledge codebook. During training, updated knowledge is stored in multiple codebook memories. To optimize knowledge storage, a similarity-aware knowledge mapping ensures that related knowledge pieces are clustered and allocated to the same memory. Additionally, LOKA resolves task conflicts by employing task-specific and multi-task memories guided by a conflict score. In the inference stage, LOKA retrieves the most relevant memory from the codebook and plugs it into the original LLM to apply the updated knowledge. A learning-based router controls codebook activation to further improve knowledge utilization. Extensive experiments demonstrate the effectiveness of LOKA in LLM knowledge updating tasks.
KG-CF: Knowledge Graph Completion with Context Filtering under the Guidance of Large Language Models
Zheng, Zaiyi, Dong, Yushun, Wang, Song, Liu, Haochen, Wang, Qi, Li, Jundong
Large Language Models (LLMs) have shown impressive performance in various tasks, including knowledge graph completion (KGC). However, current studies mostly apply LLMs to classification tasks, like identifying missing triplets, rather than ranking-based tasks, where the model ranks candidate entities based on plausibility. This focus limits the practical use of LLMs in KGC, as real-world applications prioritize highly plausible triplets. Additionally, while graph paths can help infer the existence of missing triplets and improve completion accuracy, they often contain redundant information. To address these issues, we propose KG-CF, a framework tailored for ranking-based KGC tasks. KG-CF leverages LLMs' reasoning abilities to filter out irrelevant contexts, achieving superior results on real-world datasets. The code and datasets are available at \url{https://anonymous.4open.science/r/KG-CF}.
Explaining Graph Neural Networks with Large Language Models: A Counterfactual Perspective for Molecular Property Prediction
He, Yinhan, Zheng, Zaiyi, Soga, Patrick, Zhu, Yaozhen, Dong, yushun, Li, Jundong
In recent years, Graph Neural Networks (GNNs) have become successful in molecular property prediction tasks such as toxicity analysis. However, due to the black-box nature of GNNs, their outputs can be concerning in high-stakes decision-making scenarios, e.g., drug discovery. Facing such an issue, Graph Counterfactual Explanation (GCE) has emerged as a promising approach to improve GNN transparency. However, current GCE methods usually fail to take domain-specific knowledge into consideration, which can result in outputs that are not easily comprehensible by humans. To address this challenge, we propose a novel GCE method, LLM-GCE, to unleash the power of large language models (LLMs) in explaining GNNs for molecular property prediction. Specifically, we utilize an autoencoder to generate the counterfactual graph topology from a set of counterfactual text pairs (CTPs) based on an input graph. Meanwhile, we also incorporate a CTP dynamic feedback module to mitigate LLM hallucination, which provides intermediate feedback derived from the generated counterfactuals as an attempt to give more faithful guidance. Extensive experiments demonstrate the superior performance of LLM-GCE. Our code is released on https://github.com/YinhanHe123/new\_LLM4GNNExplanation.
A Benchmark for Fairness-Aware Graph Learning
Dong, Yushun, Wang, Song, Lei, Zhenyu, Zheng, Zaiyi, Ma, Jing, Chen, Chen, Li, Jundong
Fairness-aware graph learning has gained increasing attention in recent years. Nevertheless, there lacks a comprehensive benchmark to evaluate and compare different fairness-aware graph learning methods, which blocks practitioners from choosing appropriate ones for broader real-world applications. In this paper, we present an extensive benchmark on ten representative fairness-aware graph learning methods. Specifically, we design a systematic evaluation protocol and conduct experiments on seven real-world datasets to evaluate these methods from multiple perspectives, including group fairness, individual fairness, the balance between different fairness criteria, and computational efficiency. Our in-depth analysis reveals key insights into the strengths and limitations of existing methods. Additionally, we provide practical guidance for applying fairness-aware graph learning methods in applications. To the best of our knowledge, this work serves as an initial step towards comprehensively understanding representative fairness-aware graph learning methods to facilitate future advancements in this area.
Knowledge Editing for Large Language Models: A Survey
Wang, Song, Zhu, Yaochen, Liu, Haochen, Zheng, Zaiyi, Chen, Chen, Li, Jundong
Large language models (LLMs) have recently transformed both the academic and industrial landscapes due to their remarkable capacity to understand, analyze, and generate texts based on their vast knowledge and reasoning ability. Nevertheless, one major drawback of LLMs is their substantial computational cost for pre-training due to their unprecedented amounts of parameters. The disadvantage is exacerbated when new knowledge frequently needs to be introduced into the pre-trained model. Therefore, it is imperative to develop effective and efficient techniques to update pre-trained LLMs. Traditional methods encode new knowledge in pre-trained LLMs through direct fine-tuning. However, naively re-training LLMs can be computationally intensive and risks degenerating valuable pre-trained knowledge irrelevant to the update in the model. Recently, Knowledge-based Model Editing (KME) has attracted increasing attention, which aims to precisely modify the LLMs to incorporate specific knowledge, without negatively influencing other irrelevant knowledge. In this survey, we aim to provide a comprehensive and in-depth overview of recent advances in the field of KME. We first introduce a general formulation of KME to encompass different KME strategies. Afterward, we provide an innovative taxonomy of KME techniques based on how the new knowledge is introduced into pre-trained LLMs, and investigate existing KME strategies while analyzing key insights, advantages, and limitations of methods from each category. Moreover, representative metrics, datasets, and applications of KME are introduced accordingly. Finally, we provide an in-depth analysis regarding the practicality and remaining challenges of KME and suggest promising research directions for further advancement in this field.