Goto

Collaborating Authors

 Xu, Zenan


GraphLoRA: Empowering LLMs Fine-Tuning via Graph Collaboration of MoE

arXiv.org Artificial Intelligence

Low-Rank Adaptation (LoRA) is a parameter-efficient fine-tuning method that has been widely adopted in various downstream applications of LLMs. Together with the Mixture-of-Expert (MoE) technique, fine-tuning approaches have shown remarkable improvements in model capability. However, the coordination of multiple experts in existing studies solely relies on the weights assigned by the simple router function. Lack of communication and collaboration among experts exacerbate the instability of LLMs due to the imbalance load problem of MoE. To address this issue, we propose a novel MoE graph-based LLM fine-tuning framework GraphLoRA, in which a graph router function is designed to capture the collaboration signals among experts by graph neural networks (GNNs). GraphLoRA enables all experts to understand input knowledge and share information from neighbor experts by aggregating operations. Besides, to enhance each expert's capability and their collaborations, we design two novel coordination strategies: the Poisson distribution-based distinction strategy and the Normal distribution-based load balance strategy. Extensive experiments on four real-world datasets demonstrate the effectiveness of our GraphLoRA in parameter-efficient fine-tuning of LLMs, showing the benefits of facilitating collaborations of multiple experts in the graph router of GraphLoRA.


RMCBench: Benchmarking Large Language Models' Resistance to Malicious Code

arXiv.org Artificial Intelligence

The emergence of Large Language Models (LLMs) has significantly influenced various aspects of software development activities. Despite their benefits, LLMs also pose notable risks, including the potential to generate harmful content and being abused by malicious developers to create malicious code. Several previous studies have focused on the ability of LLMs to resist the generation of harmful content that violates human ethical standards, such as biased or offensive content. However, there is no research evaluating the ability of LLMs to resist malicious code generation. To fill this gap, we propose RMCBench, the first benchmark comprising 473 prompts designed to assess the ability of LLMs to resist malicious code generation. This benchmark employs two scenarios: a text-to-code scenario, where LLMs are prompted with descriptions to generate code, and a code-to-code scenario, where LLMs translate or complete existing malicious code. Based on RMCBench, we conduct an empirical study on 11 representative LLMs to assess their ability to resist malicious code generation. Our findings indicate that current LLMs have a limited ability to resist malicious code generation with an average refusal rate of 40.36% in text-to-code scenario and 11.52% in code-to-code scenario. The average refusal rate of all LLMs in RMCBench is only 28.71%; ChatGPT-4 has a refusal rate of only 35.73%. We also analyze the factors that affect LLMs' ability to resist malicious code generation and provide implications for developers to enhance model robustness.


Learning Summary-Worthy Visual Representation for Abstractive Summarization in Video

arXiv.org Artificial Intelligence

Multimodal abstractive summarization for videos (MAS) requires generating a concise textual summary to describe the highlights of a video according to multimodal resources, in our case, the video content and its transcript. Inspired by the success of the large-scale generative pre-trained language model (GPLM) in generating high-quality textual content (e.g., summary), recent MAS methods have proposed to adapt the GPLM to this task by equipping it with the visual information, which is often obtained through a general-purpose visual feature extractor. However, the generally extracted visual features may overlook some summary-worthy visual information, which impedes model performance. In this work, we propose a novel approach to learning the summary-worthy visual representation that facilitates abstractive summarization. Our method exploits the summary-worthy information from both the cross-modal transcript data and the knowledge that distills from the pseudo summary. Extensive experiments on three public multimodal datasets show that our method outperforms all competing baselines. Furthermore, with the advantages of summary-worthy visual information, our model can have a significant improvement on small datasets or even datasets with limited training data.


Reasoning Over Semantic-Level Graph for Fact Checking

arXiv.org Artificial Intelligence

We study fact-checking in this paper, which aims to verify a textual claim given textual evidence (e.g., retrieved sentences from Wikipedia). Existing studies typically either concatenate retrieved sentences as a single string or use feature fusion on the top of features of sentences, while ignoring semantic-level information including participants, location, and temporality of an event occurred in a sentence and relationships among multiple events. Such semantic-level information is crucial for understanding the relational structure of evidence and the deep reasoning procedure over that. In this paper, we address this issue by proposing a graph-based reasoning framework, called the Dynamic REAsoning Machine (DREAM) framework. We first construct a semantic-level graph, where nodes are extracted by semantic role labeling toolkits and are connected by inner- and inter- sentence edges. After having the automatically constructed graph, we use XLNet as the backbone of our approach and propose a graph-based contextual word representation learning module and a graph-based reasoning module to leverage the information of graphs. The first module is designed by considering a claim as a sequence, in which case we use the graph structure to re-define the relative distance of words. On top of this, we propose the second module by considering both the claim and the evidence as graphs and use a graph neural network to capture the semantic relationship at a more abstract level. We conduct experiments on FEVER, a large-scale benchmark dataset for fact-checking. Results show that both of the graph-based modules improve performance. Our system is the state-of-the-art system on the public leaderboard in terms of both accuracy and FEVER score.