AITopics | Zhang, Collin

Collaborating Authors

Zhang, Collin

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Controlled Generation of Natural Adversarial Documents for Stealthy Retrieval Poisoning

Zhang, Collin, Zhang, Tingwei, Shmatikov, Vitaly

arXiv.org Artificial IntelligenceOct-2-2024

Recent work showed that retrieval based on embedding similarity (e.g., for retrieval-augmented generation) is vulnerable to poisoning: an adversary can craft malicious documents that are retrieved in response to broad classes of queries. We demonstrate that previous, HotFlip-based techniques produce documents that are very easy to detect using perplexity filtering. Even if generation is constrained to produce low-perplexity text, the resulting documents are recognized as unnatural by LLMs and can be automatically filtered from the retrieval corpus. We design, implement, and evaluate a new controlled generation technique that combines an adversarial objective (embedding similarity) with a "naturalness" objective based on soft scores computed using an open-source, surrogate LLM. The resulting adversarial documents (1) cannot be automatically detected using perplexity filtering and/or other LLMs, except at the cost of significant false positives in the retrieval corpus, yet (2) achieve similar poisoning efficacy to easilydetectable documents generated using HotFlip, and (3) are significantly more effective than prior methods for energy-guided generation, such as COLD. Many modern retrieval systems use embeddings, i.e., dense vector representations, of documents and queries to enable retrieval based on semantic similarity. Chaudhari et al. (2024) and Zhong et al. (2023) recently demonstrated that an adversary can use HotFlip Ebrahimi et al. (2018) to generate documents whose embeddings have high similarity to, and will thus be retrieved in response to, broad classes of queries. We first demonstrate that adversarial documents produced by HotFlip have much higher perplexity than normal text and can be filtered out with negligible collateral damage (i.e., false positives).

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2410.02163

Genre: Research Report (0.64)

Industry:

Leisure & Entertainment > Sports > Olympic Games (1.00)
Leisure & Entertainment > Games > Computer Games (0.74)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.89)

Add feedback

Soft Prompts Go Hard: Steering Visual Language Models with Hidden Meta-Instructions

Zhang, Tingwei, Zhang, Collin, Morris, John X., Bagdasaryan, Eugene, Shmatikov, Vitaly

arXiv.org Artificial IntelligenceJul-11-2024

We introduce a new type of indirect injection vulnerabilities in language models that operate on images: hidden "meta-instructions" that influence how the model interprets the image and steer the model's outputs to express an adversary-chosen style, sentiment, or point of view. We explain how to create meta-instructions by generating images that act as soft prompts. Unlike jailbreaking attacks and adversarial examples, the outputs resulting from these images are plausible and based on the visual content of the image, yet follow the adversary's (meta-)instructions. We describe the risks of these attacks, including misinformation and spin, evaluate their efficacy for multiple visual language models and adversarial meta-objectives, and demonstrate how they can "unlock" the capabilities of the underlying language models that are unavailable via explicit text instructions. Finally, we discuss defenses against these attacks.

hidden meta-instruction, steering visual language model

arXiv.org Artificial Intelligence

2407.0897

Genre: Research Report (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Visual Languages (0.60)

Add feedback

Extracting Prompts by Inverting LLM Outputs

Zhang, Collin, Morris, John X., Shmatikov, Vitaly

arXiv.org Artificial IntelligenceMay-23-2024

We consider the problem of language model inversion: given outputs of a language model, we seek to extract the prompt that generated these outputs. We develop a new black-box method, output2prompt, that learns to extract prompts without access to the model's logits and without adversarial or jailbreaking queries. In contrast to previous work, output2prompt only needs outputs of normal user queries. To improve memory efficiency, output2prompt employs a new sparse encoding techique. We measure the efficacy of output2prompt on a variety of user and system prompts and demonstrate zero-shot transferability across different LLMs.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2405.15012

Country: North America (0.14)

Genre:

Research Report (1.00)
Personal > Interview (0.92)

Industry:

Transportation (0.66)
Energy > Oil & Gas (0.46)
Health & Medicine > Consumer Health (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback