Keep on Swimming: Real Attackers Only Need Partial Knowledge of a Multi-Model System

Oct-30-2024–arXiv.org Artificial Intelligence

Recent approaches in machine learning often solve a task using a composition of multiple models or agentic architectures. When targeting a composed system with adversarial attacks, it might not be computationally or informationally feasible to train an end-to-end proxy model or a proxy model for every component of the system. We introduce a method to craft an adversarial attack against the overall multi-model system when we only have a proxy model for the final black-box model, and when the transformation applied by the initial models can make the adversarial perturbations ineffective. Current methods handle this by applying many copies of the first model/transformation to an input and then re-use a standard adversarial attack by averaging gradients, or learning a proxy model for both stages. To our knowledge, this is the first attack specifically designed for this threat model and our method has a substantially higher attack success rate (80% vs 25%) and contains 9.4% smaller perturbations (MSE) compared to prior state-of-the-art methods. Our experiments focus on a supervised image pipeline, but we are confident the attack will generalize to other multi-model settings [e.g. a mix of open/closed source foundation models], or agentic systems

arxiv, perturbation, transformation, (14 more...)

arXiv.org Artificial Intelligence

Oct-30-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.14)
- Asia > Middle East
  - Jordan (0.04)

Genre:
- Research Report (0.89)

Industry:
- Information Technology > Security & Privacy (1.00)

Technology:
- Information Technology
  - Security & Privacy (1.00)
  - Artificial Intelligence
    - Natural Language > Large Language Model (1.00)
    - Representation & Reasoning > Agents (0.68)
    - Machine Learning > Neural Networks
      - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found