AITopics | Mao, Chengzhi

Collaborating Authors

Mao, Chengzhi

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Diversity Helps Jailbreak Large Language Models

Zhao, Weiliang, Ben-Levi, Daniel, Yang, Junfeng, Mao, Chengzhi

arXiv.org Artificial IntelligenceNov-6-2024

We have uncovered a powerful jailbreak technique that leverages large language models' ability to diverge from prior context, enabling them to bypass safety constraints and generate harmful outputs. By simply instructing the LLM to deviate and obfuscate previous attacks, our method dramatically outperforms existing approaches, achieving up to a 62% higher success rate in compromising nine leading chatbots, including GPT-4, Gemini, and Llama, while using only 13% of the queries. This revelation exposes a critical flaw in current LLM safety training, suggesting that existing methods may merely mask vulnerabilities rather than eliminate them. Our findings sound an urgent alarm for the need to revolutionize testing methodologies to ensure robust and reliable LLM security. Figure 1: Example of our successful jailbreak attack on OpenAI's new model GPT-4o, where the attack goal is to write a tutorial on how to make and distribute counterfeit goods. The adversarial prompt was generated during ...

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2411.04223

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Government (0.93)
Law Enforcement & Public Safety (0.93)
Media > News (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

I Can Hear You: Selective Robust Training for Deepfake Audio Detection

Zhang, Zirui, Hao, Wei, Sankoh, Aroon, Lin, William, Mendiola-Ortiz, Emanuel, Yang, Junfeng, Mao, Chengzhi

arXiv.org Artificial IntelligenceOct-31-2024

Recent advances in AI-generated voices have intensified the challenge of detecting deepfake audio, posing risks for scams and the spread of disinformation. To tackle this issue, we establish the largest public voice dataset to date, named DeepFakeVox-HQ, comprising 1.3 million samples, including 270,000 high-quality deepfake samples from 14 diverse sources. Despite previously reported high accuracy, existing deepfake voice detectors struggle with our diversely collected dataset, and their detection success rates drop even further under realistic corruptions and adversarial attacks. We conduct a holistic investigation into factors that enhance model robustness and show that incorporating a diversified set of voice augmentations is beneficial. Moreover, we find that the best detection models often rely on high-frequency features, which are imperceptible to humans and can be easily manipulated by an attacker. To address this, we propose the F-SAT: Frequency-Selective Adversarial Training method focusing on high-frequency components. Empirical results demonstrate that using our training dataset boosts baseline model performance (without robust training) by 33%, and our robust training further improves accuracy by 7.7% on clean samples and by 29.3% on corrupted and attacked samples, over the state-of-the-art RawNet3 model.

artificial intelligence, arxiv preprint arxiv, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2411.00121

Country: North America > United States (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

SPIN: Self-Supervised Prompt INjection

Zhou, Leon, Yang, Junfeng, Mao, Chengzhi

arXiv.org Artificial IntelligenceOct-17-2024

Large Language Models (LLMs) are increasingly used in a variety of important applications, yet their safety and reliability remain as major concerns. Various adversarial and jailbreak attacks have been proposed to bypass the safety alignment and cause the model to produce harmful responses. We introduce Self-supervised Prompt INjection (SPIN) which can detect and reverse these various attacks on LLMs. As our self-supervised prompt defense is done at inference-time, it is also compatible with existing alignment and adds an additional layer of safety for defense. Our benchmarks demonstrate that our system can reduce the attack success rate by up to 87.9%, while maintaining the performance on benign user requests. In addition, we discuss the situation of an adaptive attacker and show that our method is still resilient against attackers who are aware of our defense.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2410.13236

Country: North America > United States (0.28)

Genre: Research Report (0.64)

Industry: Information Technology > Security & Privacy (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.70)

Add feedback

RAFT: Realistic Attacks to Fool Text Detectors

Wang, James, Li, Ran, Yang, Junfeng, Mao, Chengzhi

arXiv.org Artificial IntelligenceOct-4-2024

Large language models (LLMs) have exhibited remarkable fluency across various tasks. However, their unethical applications, such as disseminating disinformation, have become a growing concern. Although recent works have proposed a number of LLM detection methods, their robustness and reliability remain unclear. In this paper, we present RAFT: a grammar error-free black-box attack against existing LLM detectors. In contrast to previous attacks for language models, our method exploits the transferability of LLM embeddings at the word-level while preserving the original text quality. We leverage an auxiliary embedding to greedily select candidate words to perturb against the target detector. Experiments reveal that our attack effectively compromises all detectors in the study across various domains by up to 99%, and are transferable across source models. Manual human evaluation studies show our attacks are realistic and indistinguishable from original human-written text. We also show that examples generated by RAFT can be used to train adversarially robust detectors. Our work shows that current LLM detectors are not adversarially robust, underscoring the urgent need for more resilient detection mechanisms.

detector, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2410.03658

Country:

Europe (1.00)
North America > United States > California (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Add feedback

ImageNet-D: Benchmarking Neural Network Robustness on Diffusion Synthetic Object

Zhang, Chenshuang, Pan, Fei, Kim, Junmo, Kweon, In So, Mao, Chengzhi

arXiv.org Artificial IntelligenceMar-27-2024

We establish rigorous benchmarks for visual perception robustness. Synthetic images such as ImageNet-C, ImageNet-9, and Stylized ImageNet provide specific type of evaluation over synthetic corruptions, backgrounds, and textures, yet those robustness benchmarks are restricted in specified variations and have low synthetic quality. In this work, we introduce generative model as a data source for synthesizing hard images that benchmark deep models' robustness. Leveraging diffusion models, we are able to generate images with more diversified backgrounds, textures, and materials than any prior work, where we term this benchmark as ImageNet-D. Experimental results show that ImageNet-D results in a significant accuracy drop to a range of vision models, from the standard ResNet visual classifier to the latest foundation models like CLIP and MiniGPT-4, significantly reducing their accuracy by up to 60\%. Our work suggests that diffusion models can be an effective source to test vision models. The code and dataset are available at https://github.com/chenshuang-zhang/imagenet_d.

imagenet-d, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2403.18775

Country:

North America > United States > Michigan (0.14)
North America > Canada > Quebec (0.14)

Genre: Research Report > New Finding (0.35)

Industry:

Information Technology > Security & Privacy (0.46)
Leisure & Entertainment > Sports > Tennis (0.30)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

SelfIE: Self-Interpretation of Large Language Model Embeddings

Chen, Haozhe, Vondrick, Carl, Mao, Chengzhi

arXiv.org Artificial IntelligenceMar-25-2024

How do large language models (LLMs) obtain their answers? The ability to explain and control an LLM's reasoning process is key for reliability, transparency, and future model developments. We propose SelfIE (Self-Interpretation of Embeddings), a framework that enables LLMs to interpret their own embeddings in natural language by leveraging their ability to respond to inquiries about a given passage. Capable of interpreting open-world concepts in the hidden embeddings, SelfIE reveals LLM internal reasoning in cases such as making ethical decisions, internalizing prompt injection, and recalling harmful knowledge. SelfIE's text descriptions on hidden embeddings also open up new avenues to control LLM reasoning. We propose Supervised Control, which allows editing open-ended concepts while only requiring gradient computation of individual layer. We extend RLHF to hidden embeddings and propose Reinforcement Control that erases harmful knowledge in LLM without supervision targets.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2403.10949

Country:

North America > United States (1.00)
Asia (0.68)
North America > Canada > Quebec > Montreal (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Raidar: geneRative AI Detection viA Rewriting

Mao, Chengzhi, Vondrick, Carl, Wang, Hao, Yang, Junfeng

arXiv.org Artificial IntelligenceJan-23-2024

We find that large language models (LLMs) are more likely to modify human-written text than AI-generated text when tasked with rewriting. This tendency arises because LLMs often perceive AI-generated text as high-quality, leading to fewer modifications. We introduce a method to detect AI-generated content by prompting LLMs to rewrite text and calculating the editing distance of the output. We dubbed our geneRative AI Detection viA Rewriting method Raidar. Raidar significantly improves the F1 detection scores of existing AI content detection models -- both academic and commercial -- across various domains, including News, creative writing, student essays, code, Yelp reviews, and arXiv papers, with gains of up to 29 points. Operating solely on word symbols without high-dimensional features, our method is compatible with black box LLMs, and is inherently robust on new content. Our results illustrate the unique imprint of machine-generated text through the lens of the machines themselves.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2401.1297

Country: North America > United States (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.94)

Industry:

Information Technology > Security & Privacy (1.00)
Food & Agriculture > Agriculture (1.00)
Education > Curriculum > Subject-Specific Education (0.35)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.70)

Add feedback

Towards Causal Deep Learning for Vulnerability Detection

Rahman, Md Mahbubur, Ceka, Ira, Mao, Chengzhi, Chakraborty, Saikat, Ray, Baishakhi, Le, Wei

arXiv.org Artificial IntelligenceJan-14-2024

Deep learning vulnerability detection has shown promising results in recent years. However, an important challenge that still blocks it from being very useful in practice is that the model is not robust under perturbation and it cannot generalize well over the out-of-distribution (OOD) data, e.g., applying a trained model to unseen projects in real world. We hypothesize that this is because the model learned non-robust features, e.g., variable names, that have spurious correlations with labels. When the perturbed and OOD datasets no longer have the same spurious features, the model prediction fails. To address the challenge, in this paper, we introduced causality into deep learning vulnerability detection. Our approach CausalVul consists of two phases. First, we designed novel perturbations to discover spurious features that the model may use to make predictions. Second, we applied the causal learning algorithms, specifically, do-calculus, on top of existing deep learning models to systematically remove the use of spurious features and thus promote causal based prediction. Our results show that CausalVul consistently improved the model accuracy, robustness and OOD performance for all the state-of-the-art models and datasets we experimented. To the best of our knowledge, this is the first work that introduces do calculus based causal learning to software engineering models and shows it's indeed useful for improving the model accuracy, robustness and generalization. Our replication package is located at https://figshare.com/s/0ffda320dcb96c249ef2.

artificial intelligence, machine learning, spurious feature, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3597503.3639170

2310.07958

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Robustifying Language Models with Test-Time Adaptation

McDermott, Noah Thomas, Yang, Junfeng, Mao, Chengzhi

arXiv.org Artificial IntelligenceOct-29-2023

Large-scale language models achieved state-of-the-art performance over a number of language tasks. However, they fail on adversarial language examples, which are sentences optimized to fool the language models but with similar semantic meanings for humans. While prior work focuses on making the language model robust at training time, retraining for robustness is often unrealistic for large-scale foundation models. Instead, we propose to make the language models robust at test time. By dynamically adapting the input sentence with predictions from masked words, we show that we can reverse many language adversarial attacks. Since our approach does not require any training, it works for novel tasks at test time and can adapt to novel adversarial corruptions. Visualizations and empirical results on two popular sentence classification datasets demonstrate that our method can repair adversarial language attacks over 65% o

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2310.19177

Country:

Europe (0.47)
North America > United States > California (0.14)

Genre: Research Report (0.65)

Industry: Government > Military (0.38)

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

Convolutional Visual Prompt for Robust Visual Perception

Tsai, Yun-Yun, Mao, Chengzhi, Yang, Junfeng

arXiv.org Artificial IntelligenceOct-25-2023

Vision models are often vulnerable to out-of-distribution (OOD) samples without adapting. While visual prompts offer a lightweight method of input-space adaptation for large-scale vision models, they rely on a high-dimensional additive vector and labeled data. This leads to overfitting when adapting models in a self-supervised test-time setting without labels. We introduce convolutional visual prompts (CVP) for label-free test-time adaptation for robust visual perception. The structured nature of CVP demands fewer trainable parameters, less than 1\% compared to standard visual prompts, combating overfitting. Extensive experiments and analysis on a wide variety of OOD visual perception tasks show that our approach is effective, improving robustness by up to 5.87% over several large-scale models.

convolutional visual prompt, robust visual perception

arXiv.org Artificial Intelligence

2303.00198

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.73)

Add feedback