Leveraging Extracted Model Adversaries for Improved Black Box Attacks

Nov-2-2020–arXiv.org Artificial Intelligence

We present a method for adversarial input generation against black box models for reading comprehension based question answering. Our approach is composed of two steps. First, we approximate a victim black box model via model extraction (Krishna et al., 2020). Second, we use our own white box method to generate input perturbations that cause the approximate model to fail. These perturbed inputs are used against the victim. In experiments we find that our method improves on the efficacy of the AddAny---a white box attack---performed on the approximate model by 25% F1, and the AddSent attack---a black box attack---by 11% F1 (Jia and Liang, 2017).

machine learning, natural language, victim model, (18 more...)

arXiv.org Artificial Intelligence

Nov-2-2020

arXiv.org PDF

Add feedback

Country:
- Oceania > Australia
  - Victoria > Melbourne (0.04)
- North America > United States
  - Maryland > Baltimore (0.04)
  - Texas > Travis County
    - Austin (0.04)
  - Rhode Island > Providence County
    - Providence (0.04)
  - New York > New York County
    - New York City (0.04)
- Europe
  - Denmark > Capital Region
    - Copenhagen (0.04)
  - Belgium > Brussels-Capital Region
    - Brussels (0.04)
- Asia
  - China > Hong Kong (0.04)
  - Middle East > Jordan (0.04)

Genre:
- Research Report (1.00)

Industry:
- Transportation > Air (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Machine Translation (0.68)
  - Machine Learning > Neural Networks
    - Deep Learning (0.94)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found