Black-box Backdoor Defense via Zero-shot Image Purification

May-31-2025, 09:54:10 GMT–Neural Information Processing Systems

Backdoor attacks inject poisoned samples into the training data, resulting in the misclassification of the poisoned input during a model's deployment. Defending against such attacks is challenging, especially for real-world black-box models where only query access is permitted. In this paper, we propose a novel defense framework against backdoor attacks through Zero-shot Image Purification (ZIP). Our framework can be applied to poisoned models without requiring internal information about the model or any prior knowledge of the clean/poisoned samples. Our defense framework involves two steps.

attacked image, large language model, machine learning, (18 more...)

Neural Information Processing Systems

May-31-2025, 09:54:10 GMT

Conferences PDF

Add feedback

Country:
- North America > United States (0.28)

Industry:
- Information Technology > Security & Privacy (1.00)

Technology:
- Information Technology
  - Artificial Intelligence
    - Machine Learning > Neural Networks
      - Deep Learning (0.46)
    - Natural Language > Large Language Model (0.71)
    - Representation & Reasoning (1.00)
    - Vision (1.00)
  - Security & Privacy (1.00)
  - Sensing and Signal Processing > Image Processing (1.00)