AITopics | Xiang, Chong

Collaborating Authors

Xiang, Chong

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Instructional Segment Embedding: Improving LLM Safety with Instruction Hierarchy

Wu, Tong, Zhang, Shujian, Song, Kaiqiang, Xu, Silei, Zhao, Sanqiang, Agrawal, Ravi, Indurthi, Sathish Reddy, Xiang, Chong, Mittal, Prateek, Zhou, Wenxuan

arXiv.org Artificial IntelligenceOct-9-2024

Large Language Models (LLMs) are susceptible to security and safety threats, such as prompt injection, prompt extraction, and harmful requests. One major cause of these vulnerabilities is the lack of an instruction hierarchy. Modern LLM architectures treat all inputs equally, failing to distinguish between and prioritize various types of instructions, such as system messages, user prompts, and data. As a result, lower-priority user prompts may override more critical system instructions, including safety protocols. Existing approaches to achieving instruction hierarchy, such as delimiters and instruction-based training, do not address this issue at the architectural level. We introduce the Instructional Segment Embedding (ISE) technique, inspired by BERT, to modern large language models, which embeds instruction priority information directly into the model. This approach enables models to explicitly differentiate and prioritize various instruction types, significantly improving safety against malicious prompts that attempt to override priority rules. Our experiments on the Structured Query and Instruction Hierarchy benchmarks demonstrate an average robust accuracy increase of up to 15.75% and 18.68%, respectively. Furthermore, we observe an improvement in instructionfollowing capability of up to 4.1% evaluated on AlpacaEval. Overall, our approach offers a promising direction for enhancing the safety and effectiveness of LLM architectures. Large Language Models (LLMs) have shown significant potential in enabling sophisticated agentic applications and facilitating autonomous decision-making across various domains, such as web agents, educational tools, medical assistance, and more (Yao et al., 2022; Gan et al., 2023; Abbasian et al., 2024). To optimize the use of AI applications, a structured approach to implementation is widely adopted. This involves clear distinctions among system instructions, user prompts, and data inputs, as illustrated in Figure 1. These instructions contain specific priorities that help the model execute functionalities correctly and better assist users. Modern LLMs process text without formal mechanisms to differentiate and prioritize instructions. Consequently, Figure 1: A demonstration of the hierarchy malicious attackers can easily exploit this limitation to of instructions, including system override priority roles, leading to various vulnerabilities. Prompt extraction (Zhang et al., 2024) aim to extract system messages, revealing proprietary prompts.

large language model, machine learning, natural language, (13 more...)

arXiv.org Artificial Intelligence

2410.09102

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.93)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Certifiably Robust RAG against Retrieval Corruption

Xiang, Chong, Wu, Tong, Zhong, Zexuan, Wagner, David, Chen, Danqi, Mittal, Prateek

arXiv.org Artificial IntelligenceMay-24-2024

Retrieval-augmented generation (RAG) has been shown vulnerable to retrieval corruption attacks: an attacker can inject malicious passages into retrieval results to induce inaccurate responses. In this paper, we propose RobustRAG as the first defense framework against retrieval corruption attacks. The key insight of RobustRAG is an isolate-then-aggregate strategy: we get LLM responses from each passage in isolation and then securely aggregate these isolated responses. To instantiate RobustRAG, we design keyword-based and decoding-based algorithms for securely aggregating unstructured text responses. Notably, RobustRAG can achieve certifiable robustness: we can formally prove and certify that, for certain queries, RobustRAG can always return accurate responses, even when the attacker has full knowledge of our defense and can arbitrarily inject a small number of malicious passages. We evaluate RobustRAG on open-domain QA and long-form text generation datasets and demonstrate its effectiveness and generalizability across various tasks and datasets.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2405.15556

Country: North America > United States > California (0.14)

Genre: Research Report (0.82)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Regional Government > North America Government > United States Government (0.67)
Government > Military > Cyberwarfare (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Position Paper: Beyond Robustness Against Single Attack Types

Dai, Sihui, Xiang, Chong, Wu, Tong, Mittal, Prateek

arXiv.org Artificial IntelligenceMay-2-2024

Current research on defending against adversarial examples focuses primarily on achieving robustness against a single attack type such as $\ell_2$ or $\ell_{\infty}$-bounded attacks. However, the space of possible perturbations is much larger and currently cannot be modeled by a single attack type. The discrepancy between the focus of current defenses and the space of attacks of interest calls to question the practicality of existing defenses and the reliability of their evaluation. In this position paper, we argue that the research community should look beyond single attack robustness, and we draw attention to three potential directions involving robustness against multiple attacks: simultaneous multiattack robustness, unforeseen attack robustness, and a newly defined problem setting which we call continual adaptive robustness. We provide a unified framework which rigorously defines these problem settings, synthesize existing research in these fields, and outline open directions. We hope that our position paper inspires more research in simultaneous multiattack, unforeseen attack, and continual adaptive robustness.

artificial intelligence, machine learning, robustness, (17 more...)

arXiv.org Artificial Intelligence

2405.01349

Country: North America > United States > California (0.14)

Genre: Research Report (0.40)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

MultiRobustBench: Benchmarking Robustness Against Multiple Attacks

Dai, Sihui, Mahloujifar, Saeed, Xiang, Chong, Sehwag, Vikash, Chen, Pin-Yu, Mittal, Prateek

arXiv.org Artificial IntelligenceJul-19-2023

The bulk of existing research in defending against adversarial examples focuses on defending against a single (typically bounded Lp-norm) attack, but for a practical setting, machine learning (ML) models should be robust to a wide variety of attacks. In this paper, we present the first unified framework for considering multiple attacks against ML models. Our framework is able to model different levels of learner's knowledge about the test-time adversary, allowing us to model robustness against unforeseen attacks and robustness against unions of attacks. Using our framework, we present the first leaderboard, MultiRobustBench, for benchmarking multiattack evaluation which captures performance across attack types and attack strengths. We evaluate the performance of 16 defended models for robustness against a set of 9 different attack types, including Lp-based threat models, spatial transformations, and color changes, at 20 different attack strengths (180 attacks total). Additionally, we analyze the state of current defenses against multiple attacks. Our analysis shows that while existing defenses have made progress in terms of average robustness across the set of attacks used, robustness against the worst-case attack is still a big open problem as all existing models perform worse than random guessing.

artificial intelligence, machine learning, robustness, (18 more...)

arXiv.org Artificial Intelligence

2302.1098

Country:

North America > United States > Hawaii (0.14)
North America > United States > California (0.14)

Genre: Research Report (0.64)

Industry: Information Technology > Security & Privacy (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

PatchGuard: A Provably Robust Defense against Adversarial Patches via Small Receptive Fields and Masking

Xiang, Chong, Bhagoji, Arjun Nitin, Sehwag, Vikash, Mittal, Prateek

arXiv.org Machine LearningOct-18-2020

Localized adversarial patches aim to induce misclassification in machine learning models by arbitrarily modifying pixels within a restricted region of an image. Such attacks can be realized in the physical world by attaching the adversarial patch to the object to be misclassified, and defending against such attacks is an unsolved/open problem. In this paper, we propose a general defense framework called PatchGuard that can achieve high provable robustness while maintaining high clean accuracy against localized adversarial patches. The cornerstone of PatchGuard involves the use of CNNs with small receptive fields to impose a bound on the number of features corrupted by an adversarial patch. Given a bounded number of corrupted features, the problem of designing an adversarial patch defense reduces to that of designing a secure feature aggregation mechanism. Towards this end, we present our robust masking defense that robustly detects and masks corrupted features to recover the correct prediction. Our extensive evaluation on ImageNet, ImageNette (a 10-class subset of ImageNet), and CIFAR-10 datasets demonstrates that our defense achieves state-of-the-art performance in terms of both provable robust accuracy and clean accuracy.

deep learning, neural network, receptive field, (20 more...)

arXiv.org Machine Learning

2005.10884

Country:

North America (0.14)
Africa (0.14)

Genre:

Overview (0.46)
Research Report (0.40)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.93)
Information Technology > Artificial Intelligence > Vision (0.93)
(2 more...)

Add feedback