AITopics | Xu, Zenan

Collaborating Authors

Xu, Zenan

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

GraphLoRA: Empowering LLMs Fine-Tuning via Graph Collaboration of MoE

Bai, Ting, Yu, Yue, Huang, Le, Xu, Zenan, Zhao, Zhe, Shi, Chuan

arXiv.org Artificial IntelligenceDec-17-2024

Low-Rank Adaptation (LoRA) is a parameter-efficient fine-tuning method that has been widely adopted in various downstream applications of LLMs. Together with the Mixture-of-Expert (MoE) technique, fine-tuning approaches have shown remarkable improvements in model capability. However, the coordination of multiple experts in existing studies solely relies on the weights assigned by the simple router function. Lack of communication and collaboration among experts exacerbate the instability of LLMs due to the imbalance load problem of MoE. To address this issue, we propose a novel MoE graph-based LLM fine-tuning framework GraphLoRA, in which a graph router function is designed to capture the collaboration signals among experts by graph neural networks (GNNs). GraphLoRA enables all experts to understand input knowledge and share information from neighbor experts by aggregating operations. Besides, to enhance each expert's capability and their collaborations, we design two novel coordination strategies: the Poisson distribution-based distinction strategy and the Normal distribution-based load balance strategy. Extensive experiments on four real-world datasets demonstrate the effectiveness of our GraphLoRA in parameter-efficient fine-tuning of LLMs, showing the benefits of facilitating collaborations of multiple experts in the graph router of GraphLoRA.

artificial intelligence, large language model, natural language, (15 more...)

arXiv.org Artificial Intelligence

2412.16216

Genre: Research Report (0.64)

Industry: Education > Curriculum > Subject-Specific Education (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

RMCBench: Benchmarking Large Language Models' Resistance to Malicious Code

Chen, Jiachi, Zhong, Qingyuan, Wang, Yanlin, Ning, Kaiwen, Liu, Yongkun, Xu, Zenan, Zhao, Zhe, Chen, Ting, Zheng, Zibin

arXiv.org Artificial IntelligenceSep-23-2024

The emergence of Large Language Models (LLMs) has significantly influenced various aspects of software development activities. Despite their benefits, LLMs also pose notable risks, including the potential to generate harmful content and being abused by malicious developers to create malicious code. Several previous studies have focused on the ability of LLMs to resist the generation of harmful content that violates human ethical standards, such as biased or offensive content. However, there is no research evaluating the ability of LLMs to resist malicious code generation. To fill this gap, we propose RMCBench, the first benchmark comprising 473 prompts designed to assess the ability of LLMs to resist malicious code generation. This benchmark employs two scenarios: a text-to-code scenario, where LLMs are prompted with descriptions to generate code, and a code-to-code scenario, where LLMs translate or complete existing malicious code. Based on RMCBench, we conduct an empirical study on 11 representative LLMs to assess their ability to resist malicious code generation. Our findings indicate that current LLMs have a limited ability to resist malicious code generation with an average refusal rate of 40.36% in text-to-code scenario and 11.52% in code-to-code scenario. The average refusal rate of all LLMs in RMCBench is only 28.71%; ChatGPT-4 has a refusal rate of only 35.73%. We also analyze the factors that affect LLMs' ability to resist malicious code generation and provide implications for developers to enhance model robustness.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3691620.3695480

2409.15154

Country:

North America > United States (0.15)
Asia > China (0.15)

Genre: Research Report > New Finding (0.87)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Learning Summary-Worthy Visual Representation for Abstractive Summarization in Video

Xu, Zenan, Meng, Xiaojun, Wang, Yasheng, Su, Qinliang, Qiu, Zexuan, Jiang, Xin, Liu, Qun

arXiv.org Artificial IntelligenceMay-8-2023

Multimodal abstractive summarization for videos (MAS) requires generating a concise textual summary to describe the highlights of a video according to multimodal resources, in our case, the video content and its transcript. Inspired by the success of the large-scale generative pre-trained language model (GPLM) in generating high-quality textual content (e.g., summary), recent MAS methods have proposed to adapt the GPLM to this task by equipping it with the visual information, which is often obtained through a general-purpose visual feature extractor. However, the generally extracted visual features may overlook some summary-worthy visual information, which impedes model performance. In this work, we propose a novel approach to learning the summary-worthy visual representation that facilitates abstractive summarization. Our method exploits the summary-worthy information from both the cross-modal transcript data and the knowledge that distills from the pseudo summary. Extensive experiments on three public multimodal datasets show that our method outperforms all competing baselines. Furthermore, with the advantages of summary-worthy visual information, our model can have a significant improvement on small datasets or even datasets with limited training data.

information, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2305.04824

Country: Asia > China (0.47)

Genre: Research Report > Promising Solution (0.66)

Technology:

Information Technology > Artificial Intelligence > Vision (0.89)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Reasoning Over Semantic-Level Graph for Fact Checking

Zhong, Wanjun, Xu, Jingjing, Tang, Duyu, Xu, Zenan, Duan, Nan, Zhou, Ming, Wang, Jiahai, Yin, Jian

arXiv.org Artificial IntelligenceSep-13-2019

We study fact-checking in this paper, which aims to verify a textual claim given textual evidence (e.g., retrieved sentences from Wikipedia). Existing studies typically either concatenate retrieved sentences as a single string or use feature fusion on the top of features of sentences, while ignoring semantic-level information including participants, location, and temporality of an event occurred in a sentence and relationships among multiple events. Such semantic-level information is crucial for understanding the relational structure of evidence and the deep reasoning procedure over that. In this paper, we address this issue by proposing a graph-based reasoning framework, called the Dynamic REAsoning Machine (DREAM) framework. We first construct a semantic-level graph, where nodes are extracted by semantic role labeling toolkits and are connected by inner- and inter- sentence edges. After having the automatically constructed graph, we use XLNet as the backbone of our approach and propose a graph-based contextual word representation learning module and a graph-based reasoning module to leverage the information of graphs. The first module is designed by considering a claim as a sequence, in which case we use the graph structure to re-define the relative distance of words. On top of this, we propose the second module by considering both the claim and the evidence as graphs and use a graph neural network to capture the semantic relationship at a more abstract level. We conduct experiments on FEVER, a large-scale benchmark dataset for fact-checking. Results show that both of the graph-based modules improve performance. Our system is the state-of-the-art system on the public leaderboard in terms of both accuracy and FEVER score.

information, survey article, text processing, (15 more...)

arXiv.org Artificial Intelligence

1909.03745

Country:

North America > United States (0.28)
Europe (0.28)

Genre: Research Report > New Finding (0.34)

Industry: Media (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback