AITopics | Bagdasaryan, Eugene

Collaborating Authors

Bagdasaryan, Eugene

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Soft Prompts Go Hard: Steering Visual Language Models with Hidden Meta-Instructions

Zhang, Tingwei, Zhang, Collin, Morris, John X., Bagdasaryan, Eugene, Shmatikov, Vitaly

arXiv.org Artificial IntelligenceJul-11-2024

We introduce a new type of indirect injection vulnerabilities in language models that operate on images: hidden "meta-instructions" that influence how the model interprets the image and steer the model's outputs to express an adversary-chosen style, sentiment, or point of view. We explain how to create meta-instructions by generating images that act as soft prompts. Unlike jailbreaking attacks and adversarial examples, the outputs resulting from these images are plausible and based on the visual content of the image, yet follow the adversary's (meta-)instructions. We describe the risks of these attacks, including misinformation and spin, evaluate their efficacy for multiple visual language models and adversarial meta-objectives, and demonstrate how they can "unlock" the capabilities of the underlying language models that are unavailable via explicit text instructions. Finally, we discuss defenses against these attacks.

hidden meta-instruction, steering visual language model

arXiv.org Artificial Intelligence

2407.0897

Genre: Research Report (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Visual Languages (0.60)

Add feedback

UnUnlearning: Unlearning is not sufficient for content regulation in advanced generative AI

Shumailov, Ilia, Hayes, Jamie, Triantafillou, Eleni, Ortiz-Jimenez, Guillermo, Papernot, Nicolas, Jagielski, Matthew, Yona, Itay, Howard, Heidi, Bagdasaryan, Eugene

arXiv.org Artificial IntelligenceJun-27-2024

Exact unlearning was first introduced as a privacy mechanism that allowed a user to retract their data from machine learning models on request. Shortly after, inexact schemes were proposed to mitigate the impractical costs associated with exact unlearning. More recently unlearning is often discussed as an approach for removal of impermissible knowledge i.e. knowledge that the model should not possess such as unlicensed copyrighted, inaccurate, or malicious information. The promise is that if the model does not have a certain malicious capability, then it cannot be used for the associated malicious purpose. In this paper we revisit the paradigm in which unlearning is used for in Large Language Models (LLMs) and highlight an underlying inconsistency arising from in-context learning. Unlearning can be an effective control mechanism for the training phase, yet it does not prevent the model from performing an impermissible act during inference. We introduce a concept of ununlearning, where unlearned knowledge gets reintroduced in-context, effectively rendering the model capable of behaving as if it knows the forgotten knowledge. As a result, we argue that content filtering for impermissible knowledge will be required and even exact unlearning schemes are not enough for effective content regulation. We discuss feasibility of ununlearning for modern LLMs and examine broader implications.

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2407.00106

Country: North America > United States (0.14)

Genre: Research Report (0.50)

Industry:

Information Technology > Security & Privacy (0.68)
Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.42)

Add feedback

Injecting Bias in Text-To-Image Models via Composite-Trigger Backdoors

Naseh, Ali, Roh, Jaechul, Bagdasaryan, Eugene, Houmansadr, Amir

arXiv.org Artificial IntelligenceJun-21-2024

Recent advances in large text-conditional image generative models such as Stable Diffusion, Midjourney, and DALL-E 3 have revolutionized the field of image generation, allowing users to produce high-quality, realistic images from textual prompts. While these developments have enhanced artistic creation and visual communication, they also present an underexplored attack opportunity: the possibility of inducing biases by an adversary into the generated images for malicious intentions, e.g., to influence society and spread propaganda. In this paper, we demonstrate the possibility of such a bias injection threat by an adversary who backdoors such models with a small number of malicious data samples; the implemented backdoor is activated when special triggers exist in the input prompt of the backdoored models. On the other hand, the model's utility is preserved in the absence of the triggers, making the attack highly undetectable. We present a novel framework that enables efficient generation of poisoning samples with composite (multi-word) triggers for such an attack. Our extensive experiments using over 1 million generated images and against hundreds of fine-tuned models demonstrate the feasibility of the presented backdoor attack. We illustrate how these biases can bypass conventional detection mechanisms, highlighting the challenges in proving the existence of biases within operational constraints. Our cost analysis confirms the low financial barrier to executing such attacks, underscoring the need for robust defensive strategies against such vulnerabilities in text-to-image generation models.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2406.15213

Country:

North America > United States > Massachusetts (0.14)
Asia (0.14)

Genre: Research Report > New Finding (0.93)

Industry:

Media (1.00)
Information Technology > Security & Privacy (1.00)
Government (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.88)

Add feedback

Air Gap: Protecting Privacy-Conscious Conversational Agents

Bagdasaryan, Eugene, Yi, Ren, Ghalebikesabi, Sahra, Kairouz, Peter, Gruteser, Marco, Oh, Sewoong, Balle, Borja, Ramage, Daniel

arXiv.org Artificial IntelligenceMay-8-2024

The growing use of large language model (LLM)-based conversational agents to manage sensitive user data raises significant privacy concerns. While these agents excel at understanding and acting on context, this capability can be exploited by malicious actors. We introduce a novel threat model where adversarial third-party apps manipulate the context of interaction to trick LLM-based agents into revealing private information not relevant to the task at hand. Grounded in the framework of contextual integrity, we introduce AirGapAgent, a privacy-conscious agent designed to prevent unintended data leakage by restricting the agent's access to only the data necessary for a specific task. Extensive experiments using Gemini, GPT, and Mistral models as agents validate our approach's effectiveness in mitigating this form of context hijacking while maintaining core agent functionality. For example, we show that a single-query context hijacking attack on a Gemini Ultra agent reduces its ability to protect user data from 94% to 45%, while an AirGapAgent achieves 97% protection, rendering the same attack ineffective.

information, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2405.05175

Country: North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Consumer Health (1.00)
Health & Medicine > Therapeutic Area > Immunology (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Mithridates: Auditing and Boosting Backdoor Resistance of Machine Learning Pipelines

Bagdasaryan, Eugene, Shmatikov, Vitaly

arXiv.org Artificial IntelligenceDec-19-2023

Machine learning (ML) models trained on data from potentially untrusted sources are vulnerable to poisoning. A small, maliciously crafted subset of the training inputs can cause the model to learn a "backdoor" task (e.g., misclassify inputs with a certain feature) in addition to its main task. Recent research proposed many hypothetical backdoor attacks whose efficacy heavily depends on the configuration and training hyperparameters of the target model. Given the variety of potential backdoor attacks, ML engineers who are not security experts have no way to measure how vulnerable their current training pipelines are, nor do they have a practical way to compare training configurations so as to pick the more resistant ones. Deploying a defense requires evaluating and choosing from among dozens of research papers and re-engineering the training pipeline. In this paper, we aim to provide ML engineers with pragmatic tools to audit the backdoor resistance of their training pipelines and to compare different training configurations, to help choose one that best balances accuracy and security. First, we propose a universal, attack-agnostic resistance metric based on the minimum number of training inputs that must be compromised before the model learns any backdoor. Second, we design, implement, and evaluate Mithridates a multi-stage approach that integrates backdoor resistance into the training-configuration search. ML developers already rely on hyperparameter search to find configurations that maximize the model's accuracy. Mithridates extends this standard tool to balance accuracy and resistance without disruptive changes to the training pipeline. We show that hyperparameters found by Mithridates increase resistance to multiple types of backdoor attacks by 3-5x with only a slight impact on accuracy. We also discuss extensions to AutoML and federated learning.

accuracy, artificial intelligence, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2302.04977

Country: North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.82)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Adversarial Illusions in Multi-Modal Embeddings

Bagdasaryan, Eugene, Jha, Rishi, Zhang, Tingwei, Shmatikov, Vitaly

arXiv.org Artificial IntelligenceOct-6-2023

Multi-modal embeddings encode images, sounds, texts, videos, etc. into a single embedding space, aligning representations across modalities (e.g., associate an image of a dog with a barking sound). We show that multi-modal embeddings can be vulnerable to an attack we call "adversarial illusions." Given an image or a sound, an adversary can perturb it so as to make its embedding close to an arbitrary, adversary-chosen input in another modality. This enables the adversary to align any image and any sound with any text. Adversarial illusions exploit proximity in the embedding space and are thus agnostic to downstream tasks. Using ImageBind embeddings, we demonstrate how adversarially aligned inputs, generated without knowledge of specific downstream tasks, mislead image generation, text generation, and zero-shot classification.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2308.11804

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.90)
Information Technology > Artificial Intelligence > Vision (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.89)

Add feedback

Abusing Images and Sounds for Indirect Instruction Injection in Multi-Modal LLMs

Bagdasaryan, Eugene, Hsieh, Tsung-Yin, Nassi, Ben, Shmatikov, Vitaly

arXiv.org Artificial IntelligenceOct-3-2023

We demonstrate how images and sounds can be used for indirect prompt and instruction injection in multi-modal LLMs. An attacker generates an adversarial perturbation corresponding to the prompt and blends it into an image or audio recording. When the user asks the (unmodified, benign) model about the perturbed image or audio, the perturbation steers the model to output the attacker-chosen text and/or make the subsequent dialog follow the attacker's instruction. We illustrate this attack with several proof-of-concept examples targeting LLaVA and PandaGPT.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2307.1049

Genre: Research Report (0.50)

Industry:

Consumer Products & Services > Food, Beverage, Tobacco & Cannabis > Beverages (1.00)
Information Technology (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Differential Privacy Has Disparate Impact on Model Accuracy

Bagdasaryan, Eugene, Shmatikov, Vitaly

arXiv.org Machine LearningMay-28-2019

Differential privacy (DP) is a popular mechanism for training machine learning models with bounded leakage about the presence of specific points in the training data. The cost of differential privacy is a reduction in the model's accuracy. We demonstrate that this cost is not borne equally: accuracy of DP models drops much more for the underrepresented classes and subgroups. For example, a DP gender classification model exhibits much lower accuracy for black faces than for white faces. Critically, this gap is bigger in the DP model than in the non-DP model, i.e., if the original model is unfair, the unfairness becomes worse once DP is applied. We demonstrate this effect for a variety of tasks and models, including sentiment analysis of text and image classification. We then explain why DP training mechanisms such as gradient clipping and noise addition have disproportionate effect on the underrepresented and more complex subgroups, resulting in a disparate reduction of model accuracy.

accuracy, deep learning, neural network, (20 more...)

arXiv.org Machine Learning

1905.12101

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (0.66)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)
Information Technology > Artificial Intelligence > Vision > Face Recognition (0.93)

Add feedback