AITopics | Xiao, Chaowei

Collaborating Authors

Xiao, Chaowei

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

AGrail: A Lifelong Agent Guardrail with Effective and Adaptive Safety Detection

Luo, Weidi, Dai, Shenghong, Liu, Xiaogeng, Banerjee, Suman, Sun, Huan, Chen, Muhao, Xiao, Chaowei

arXiv.org Artificial IntelligenceFeb-18-2025

The rapid advancements in Large Language Models (LLMs) have enabled their deployment as autonomous agents for handling complex tasks in dynamic environments. These LLMs demonstrate strong problem-solving capabilities and adaptability to multifaceted scenarios. However, their use as agents also introduces significant risks, including task-specific risks, which are identified by the agent administrator based on the specific task requirements and constraints, and systemic risks, which stem from vulnerabilities in their design or interactions, potentially compromising confidentiality, integrity, or availability (CIA) of information and triggering security risks. Existing defense agencies fail to adaptively and effectively mitigate these risks. In this paper, we propose AGrail, a lifelong agent guardrail to enhance LLM agent safety, which features adaptive safety check generation, effective safety check optimization, and tool compatibility and flexibility. Extensive experiments demonstrate that AGrail not only achieves strong performance against task-specific and system risks but also exhibits transferability across different LLM agents' tasks.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2502.11448

Country: North America > United States (1.00)

Genre: Research Report > New Finding (0.46)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

VLM-Guard: Safeguarding Vision-Language Models via Fulfilling Safety Alignment Gap

Liu, Qin, Wang, Fei, Xiao, Chaowei, Chen, Muhao

arXiv.org Artificial IntelligenceFeb-14-2025

The emergence of vision language models (VLMs) comes with increased safety concerns, as the incorporation of multiple modalities heightens vulnerability to attacks. Although VLMs can be built upon LLMs that have textual safety alignment, it is easily undermined when the vision modality is integrated. We attribute this safety challenge to the modality gap, a separation of image and text in the shared representation space, which blurs the distinction between harmful and harmless queries that is evident in LLMs but weakened in VLMs. To avoid safety decay and fulfill the safety alignment gap, we propose VLM-Guard, an inference-time intervention strategy that leverages the LLM component of a VLM as supervision for the safety alignment of the VLM. VLM-Guard projects the representations of VLM into the subspace that is orthogonal to the safety steering direction that is extracted from the safety-aligned LLM. Experimental results on three malicious instruction settings show the effectiveness of VLM-Guard in safeguarding VLM and fulfilling the safety alignment gap between VLM and its LLM component.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2502.10486

Genre: Research Report (0.50)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Law (0.69)
Information Technology > Security & Privacy (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Safety at Scale: A Comprehensive Survey of Large Model Safety

Ma, Xingjun, Gao, Yifeng, Wang, Yixu, Wang, Ruofan, Wang, Xin, Sun, Ye, Ding, Yifan, Xu, Hengyuan, Chen, Yunhao, Zhao, Yunhan, Huang, Hanxun, Li, Yige, Zhang, Jiaming, Zheng, Xiang, Bai, Yang, Wu, Zuxuan, Qiu, Xipeng, Zhang, Jingfeng, Li, Yiming, Sun, Jun, Wang, Cong, Gu, Jindong, Wu, Baoyuan, Chen, Siheng, Zhang, Tianwei, Liu, Yang, Gong, Mingming, Liu, Tongliang, Pan, Shirui, Xie, Cihang, Pang, Tianyu, Dong, Yinpeng, Jia, Ruoxi, Zhang, Yang, Ma, Shiqing, Zhang, Xiangyu, Gong, Neil, Xiao, Chaowei, Erfani, Sarah, Li, Bo, Sugiyama, Masashi, Tao, Dacheng, Bailey, James, Jiang, Yu-Gang

arXiv.org Artificial IntelligenceFeb-12-2025

The rapid advancement of large models, driven by their exceptional abilities in learning and generalization through large-scale pre-training, has reshaped the landscape of Artificial Intelligence (AI). These models are now foundational to a wide range of applications, including conversational AI, recommendation systems, autonomous driving, content generation, medical diagnostics, and scientific discovery. However, their widespread deployment also exposes them to significant safety risks, raising concerns about robustness, reliability, and ethical implications. This survey provides a systematic review of current safety research on large models, covering Vision Foundation Models (VFMs), Large Language Models (LLMs), Vision-Language Pre-training (VLP) models, Vision-Language Models (VLMs), Diffusion Models (DMs), and large-model-based Agents. Our contributions are summarized as follows: (1) We present a comprehensive taxonomy of safety threats to these models, including adversarial attacks, data poisoning, backdoor attacks, jailbreak and prompt injection attacks, energy-latency attacks, data and model extraction attacks, and emerging agent-specific threats. (2) We review defense strategies proposed for each type of attacks if available and summarize the commonly used datasets and benchmarks for safety research. (3) Building on this, we identify and discuss the open challenges in large model safety, emphasizing the need for comprehensive safety evaluations, scalable and effective defense mechanisms, and sustainable data practices. More importantly, we highlight the necessity of collective efforts from the research community and international collaboration. Our work can serve as a useful reference for researchers and practitioners, fostering the ongoing development of comprehensive defense systems and platforms to safeguard AI models.

adversarial example, large language model, machine learning, (23 more...)

arXiv.org Artificial Intelligence

2502.05206

Country:

Asia (1.00)
North America > United States > Wisconsin (0.13)
North America > United States > Massachusetts (0.13)
(2 more...)

Genre:

Overview (1.00)
Research Report > Promising Solution (0.45)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(3 more...)

Add feedback

DreamDrive: Generative 4D Scene Modeling from Street View Images

Mao, Jiageng, Li, Boyi, Ivanovic, Boris, Chen, Yuxiao, Wang, Yan, You, Yurong, Xiao, Chaowei, Xu, Danfei, Pavone, Marco, Wang, Yue

arXiv.org Artificial IntelligenceJan-3-2025

Synthesizing photo-realistic visual observations from an ego vehicle's driving trajectory is a critical step towards scalable training of self-driving models. Reconstruction-based methods create 3D scenes from driving logs and synthesize geometry-consistent driving videos through neural rendering, but their dependence on costly object annotations limits their ability to generalize to in-the-wild driving scenarios. On the other hand, generative models can synthesize action-conditioned driving videos in a more generalizable way but often struggle with maintaining 3D visual consistency. In this paper, we present DreamDrive, a 4D spatial-temporal scene generation approach that combines the merits of generation and reconstruction, to synthesize generalizable 4D driving scenes and dynamic driving videos with 3D consistency. Specifically, we leverage the generative power of video diffusion models to synthesize a sequence of visual references and further elevate them to 4D with a novel hybrid Gaussian representation. Given a driving trajectory, we then render 3D-consistent driving videos via Gaussian splatting. The use of generative priors allows our method to produce high-quality 4D scenes from in-the-wild driving data, while neural rendering ensures 3D-consistent video generation from the 4D scenes. Extensive experiments on nuScenes and street view images demonstrate that DreamDrive can generate controllable and generalizable 4D driving scenes, synthesize novel views of driving videos with high fidelity and 3D consistency, decompose static and dynamic elements in a self-supervised manner, and enhance perception and planning tasks for autonomous driving.

artificial intelligence, arxiv preprint arxiv, machine learning, (12 more...)

arXiv.org Artificial Intelligence

2501.00601

Country: North America > United States (0.28)

Genre: Research Report (0.40)

Industry:

Information Technology (0.51)
Automobiles & Trucks (0.37)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.94)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

RePD: Defending Jailbreak Attack through a Retrieval-based Prompt Decomposition Process

Wang, Peiran, Liu, Xiaogeng, Xiao, Chaowei

arXiv.org Artificial IntelligenceNov-28-2024

In this study, we introduce RePD, an innovative attack Retrieval-based Prompt Decomposition framework designed to mitigate the risk of jailbreak attacks on large language models (LLMs). Despite rigorous pretraining and finetuning focused on ethical alignment, LLMs are still susceptible to jailbreak exploits. RePD operates on a one-shot learning model, wherein it accesses a database of pre-collected jailbreak prompt templates to identify and decompose harmful inquiries embedded within user prompts. This process involves integrating the decomposition of the jailbreak prompt into the user's original query into a one-shot learning example to effectively teach the LLM to discern and separate malicious components. Consequently, the LLM is equipped to first neutralize any potentially harmful elements before addressing the user's prompt in a manner that aligns with its ethical guidelines. RePD is versatile and compatible with a variety of open-source LLMs acting as agents. Through comprehensive experimentation with both harmful and benign prompts, we have demonstrated the efficacy of our proposed RePD in enhancing the resilience of LLMs against jailbreak attacks, without compromising their performance in responding to typical user requests.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2410.0866

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.32)

Add feedback

AutoDAN-Turbo: A Lifelong Agent for Strategy Self-Exploration to Jailbreak LLMs

Liu, Xiaogeng, Li, Peiran, Suh, Edward, Vorobeychik, Yevgeniy, Mao, Zhuoqing, Jha, Somesh, McDaniel, Patrick, Sun, Huan, Li, Bo, Xiao, Chaowei

arXiv.org Artificial IntelligenceNov-26-2024

In this paper, we propose AutoDAN-Turbo, a black-box jailbreak method that can automatically discover as many jailbreak strategies as possible from scratch, without any human intervention or predefined scopes (e.g., specified candidate strategies), and use them for red-teaming. As a result, AutoDAN-Turbo can significantly outperform baseline methods, achieving a 74.3% higher average attack success rate on public benchmarks. Notably, AutoDAN-Turbo achieves an 88.5 attack success rate on GPT-4-1106-turbo. In addition, AutoDAN-Turbo is a unified framework that can incorporate existing human-designed jailbreak strategies in a plug-and-play manner. By integrating human-designed strategies, AutoDAN-Turbo can even achieve a higher attack success rate of 93.4 on GPT-4-1106-turbo.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2410.05295

Country:

Asia (0.67)
North America > United States (0.67)

Genre: Research Report > New Finding (0.67)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area (1.00)
Education (0.93)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

FATH: Authentication-based Test-time Defense against Indirect Prompt Injection Attacks

Wang, Jiongxiao, Wu, Fangzhou, Li, Wendi, Pan, Jinsheng, Suh, Edward, Mao, Z. Morley, Chen, Muhao, Xiao, Chaowei

arXiv.org Artificial IntelligenceNov-25-2024

Large language models (LLMs) have been widely deployed as the backbone with additional tools and text information for real-world applications. However, integrating external information into LLM-integrated applications raises significant security concerns. Among these, prompt injection attacks are particularly threatening, where malicious instructions injected in the external text information can exploit LLMs to generate answers as the attackers desire. While both training-time and test-time defense methods have been developed to mitigate such attacks, the unaffordable training costs associated with training-time methods and the limited effectiveness of existing test-time methods make them impractical. This paper introduces a novel test-time defense strategy, named Formatting AuThentication with Hash-based tags (FATH). Unlike existing approaches that prevent LLMs from answering additional instructions in external text, our method implements an authentication system, requiring LLMs to answer all received instructions with a security policy and selectively filter out responses to user instructions as the final output. To achieve this, we utilize hash-based authentication tags to label each response, facilitating accurate identification of responses according to the user's instructions and improving the robustness against adaptive attacks. Comprehensive experiments demonstrate that our defense method can effectively defend against indirect prompt injection attacks, achieving state-of-the-art performance under Llama3 and GPT3.5 models across various attack methods. Our code is released at: https://github.com/Jayfeather1024/FATH

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2410.21492

Country: North America > United States (0.67)

Genre: Research Report (0.82)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)

Add feedback

LeanAgent: Lifelong Learning for Formal Theorem Proving

Kumarappan, Adarsh, Tiwari, Mo, Song, Peiyang, George, Robert Joseph, Xiao, Chaowei, Anandkumar, Anima

arXiv.org Artificial IntelligenceNov-24-2024

Large Language Models (LLMs) have been successful in mathematical reasoning tasks such as formal theorem proving when integrated with interactive proof assistants like Lean. Existing approaches involve training or fine-tuning an LLM on a specific dataset to perform well on particular domains, such as undergraduate-level mathematics. These methods struggle with generalizability to advanced mathematics. A fundamental limitation is that these approaches operate on static domains, failing to capture how mathematicians often work across multiple domains and projects simultaneously or cyclically. We present LeanAgent, a novel lifelong learning framework for formal theorem proving that continuously generalizes to and improves on ever-expanding mathematical knowledge without forgetting previously learned knowledge. LeanAgent introduces several key innovations, including a curriculum learning strategy that optimizes the learning trajectory in terms of mathematical difficulty, a dynamic database for efficient management of evolving mathematical knowledge, and progressive training to balance stability and plasticity. LeanAgent successfully proves 155 theorems previously unproved formally by humans across 23 diverse Lean repositories, many from advanced mathematics. It performs significantly better than the static LLM baseline, proving challenging theorems in domains like abstract algebra and algebraic topology while showcasing a clear progression of learning from basic concepts to advanced topics. In addition, we analyze LeanAgent's superior performance on key lifelong learning metrics. LeanAgent achieves exceptional scores in stability and backward transfer, where learning new tasks improves performance on previously learned tasks. This emphasizes LeanAgent's continuous generalizability and improvement, explaining its superior theorem-proving performance.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2410.06209

Country:

North America > Canada (0.67)
North America > United States (0.67)

Genre: Instructional Material > Course Syllabus & Notes (0.66)

Industry: Education > Educational Setting > Continuing Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

SudoLM: Learning Access Control of Parametric Knowledge with Authorization Alignment

Liu, Qin, Wang, Fei, Xiao, Chaowei, Chen, Muhao

arXiv.org Artificial IntelligenceOct-18-2024

Existing preference alignment is a one-size-fits-all alignment mechanism, where the part of the large language model (LLM) parametric knowledge with non-preferred features is uniformly blocked to all the users. However, this part of knowledge can be useful to advanced users whose expertise qualifies them to handle these information. The one-size-fits-all alignment mechanism undermines LLM's utility for these qualified users. To address this problem, we propose SudoLM, a framework that lets LLMs learn access control over specific parametric knowledge for users with different credentials via authorization alignment. SudoLM allows authorized users to unlock their access to all the parametric knowledge with an assigned SUDO key while blocking access to non-qualified users. Experiments on two application scenarios demonstrate that SudoLM effectively controls the user's access to the parametric knowledge and maintains its general utility.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2410.14676

Country: North America > Mexico (0.29)

Genre: Research Report > New Finding (0.46)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Mitigating Backdoor Threats to Large Language Models: Advancement and Challenges

Liu, Qin, Mo, Wenjie, Tong, Terry, Xu, Jiashu, Wang, Fei, Xiao, Chaowei, Chen, Muhao

arXiv.org Artificial IntelligenceSep-30-2024

The advancement of Large Language Models (LLMs) has significantly impacted various domains, including Web search, healthcare, and software development. However, as these models scale, they become more vulnerable to cybersecurity risks, particularly backdoor attacks. By exploiting the potent memorization capacity of LLMs, adversaries can easily inject backdoors into LLMs by manipulating a small portion of training data, leading to malicious behaviors in downstream applications whenever the hidden backdoor is activated by the pre-defined triggers. Moreover, emerging learning paradigms like instruction tuning and reinforcement learning from human feedback (RLHF) exacerbate these risks as they rely heavily on crowdsourced data and human feedback, which are not fully controlled. In this paper, we present a comprehensive survey of emerging backdoor threats to LLMs that appear during LLM development or inference, and cover recent advancement in both defense and detection strategies for mitigating backdoor threats to LLMs. We also outline key challenges in addressing these threats, highlighting areas for future research.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2409.19993

Country: North America > United States > California (0.28)

Genre: Overview (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Government (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback