AITopics | Zhang, Tingwei

Collaborating Authors

Zhang, Tingwei

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Controlled Generation of Natural Adversarial Documents for Stealthy Retrieval Poisoning

Zhang, Collin, Zhang, Tingwei, Shmatikov, Vitaly

arXiv.org Artificial IntelligenceOct-2-2024

Recent work showed that retrieval based on embedding similarity (e.g., for retrieval-augmented generation) is vulnerable to poisoning: an adversary can craft malicious documents that are retrieved in response to broad classes of queries. We demonstrate that previous, HotFlip-based techniques produce documents that are very easy to detect using perplexity filtering. Even if generation is constrained to produce low-perplexity text, the resulting documents are recognized as unnatural by LLMs and can be automatically filtered from the retrieval corpus. We design, implement, and evaluate a new controlled generation technique that combines an adversarial objective (embedding similarity) with a "naturalness" objective based on soft scores computed using an open-source, surrogate LLM. The resulting adversarial documents (1) cannot be automatically detected using perplexity filtering and/or other LLMs, except at the cost of significant false positives in the retrieval corpus, yet (2) achieve similar poisoning efficacy to easilydetectable documents generated using HotFlip, and (3) are significantly more effective than prior methods for energy-guided generation, such as COLD. Many modern retrieval systems use embeddings, i.e., dense vector representations, of documents and queries to enable retrieval based on semantic similarity. Chaudhari et al. (2024) and Zhong et al. (2023) recently demonstrated that an adversary can use HotFlip Ebrahimi et al. (2018) to generate documents whose embeddings have high similarity to, and will thus be retrieved in response to, broad classes of queries. We first demonstrate that adversarial documents produced by HotFlip have much higher perplexity than normal text and can be filtered out with negligible collateral damage (i.e., false positives).

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2410.02163

Genre: Research Report (0.64)

Industry:

Leisure & Entertainment > Sports > Olympic Games (1.00)
Leisure & Entertainment > Games > Computer Games (0.74)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.89)

Add feedback

Soft Prompts Go Hard: Steering Visual Language Models with Hidden Meta-Instructions

Zhang, Tingwei, Zhang, Collin, Morris, John X., Bagdasaryan, Eugene, Shmatikov, Vitaly

arXiv.org Artificial IntelligenceJul-11-2024

We introduce a new type of indirect injection vulnerabilities in language models that operate on images: hidden "meta-instructions" that influence how the model interprets the image and steer the model's outputs to express an adversary-chosen style, sentiment, or point of view. We explain how to create meta-instructions by generating images that act as soft prompts. Unlike jailbreaking attacks and adversarial examples, the outputs resulting from these images are plausible and based on the visual content of the image, yet follow the adversary's (meta-)instructions. We describe the risks of these attacks, including misinformation and spin, evaluate their efficacy for multiple visual language models and adversarial meta-objectives, and demonstrate how they can "unlock" the capabilities of the underlying language models that are unavailable via explicit text instructions. Finally, we discuss defenses against these attacks.

hidden meta-instruction, steering visual language model

arXiv.org Artificial Intelligence

2407.0897

Genre: Research Report (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Visual Languages (0.60)

Add feedback

SoK: Pitfalls in Evaluating Black-Box Attacks

Suya, Fnu, Suri, Anshuman, Zhang, Tingwei, Hong, Jingtao, Tian, Yuan, Evans, David

arXiv.org Artificial IntelligenceOct-26-2023

Numerous works study black-box attacks on image classifiers. However, these works make different assumptions on the adversary's knowledge and current literature lacks a cohesive organization centered around the threat model. To systematize knowledge in this area, we propose a taxonomy over the threat space spanning the axes of feedback granularity, the access of interactive queries, and the quality and quantity of the auxiliary data available to the attacker. Our new taxonomy provides three key insights. 1) Despite extensive literature, numerous under-explored threat spaces exist, which cannot be trivially solved by adapting techniques from well-explored settings. We demonstrate this by establishing a new state-of-the-art in the less-studied setting of access to top-k confidence scores by adapting techniques from well-explored settings of accessing the complete confidence vector, but show how it still falls short of the more restrictive setting that only obtains the prediction label, highlighting the need for more research. 2) Identification the threat model of different attacks uncovers stronger baselines that challenge prior state-of-the-art claims. We demonstrate this by enhancing an initially weaker baseline (under interactive query access) via surrogate models, effectively overturning claims in the respective paper. 3) Our taxonomy reveals interactions between attacker knowledge that connect well to related areas, such as model inversion and extraction attacks. We discuss how advances in other areas can enable potentially stronger black-box attacks. Finally, we emphasize the need for a more realistic assessment of attack success by factoring in local attack runtime. This approach reveals the potential for certain attacks to achieve notably higher success rates and the need to evaluate attacks in diverse and harder settings, highlighting the need for better selection criteria.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2310.17534

Country: North America > United States > California (0.14)

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Transportation > Air (0.85)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Vision (0.96)
(4 more...)

Add feedback

Adversarial Illusions in Multi-Modal Embeddings

Bagdasaryan, Eugene, Jha, Rishi, Zhang, Tingwei, Shmatikov, Vitaly

arXiv.org Artificial IntelligenceOct-6-2023

Multi-modal embeddings encode images, sounds, texts, videos, etc. into a single embedding space, aligning representations across modalities (e.g., associate an image of a dog with a barking sound). We show that multi-modal embeddings can be vulnerable to an attack we call "adversarial illusions." Given an image or a sound, an adversary can perturb it so as to make its embedding close to an arbitrary, adversary-chosen input in another modality. This enables the adversary to align any image and any sound with any text. Adversarial illusions exploit proximity in the embedding space and are thus agnostic to downstream tasks. Using ImageBind embeddings, we demonstrate how adversarially aligned inputs, generated without knowledge of specific downstream tasks, mislead image generation, text generation, and zero-shot classification.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2308.11804

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.90)
Information Technology > Artificial Intelligence > Vision (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.89)

Add feedback