Evaluations and Methods for Explanation through Robustness Analysis

Hsieh, Cheng-Yu, Yeh, Chih-Kuan, Liu, Xuanqing, Ravikumar, Pradeep, Kim, Seungyeon, Kumar, Sanjiv, Hsieh, Cho-Jui

May-31-2020–arXiv.org Machine Learning

Among multiple ways of interpreting a machine learning model, measuring the importance of a set of features tied to a prediction is probably one of the most intuitive ways to explain a model. In this paper, we establish the link between a set of features to a prediction with a new evaluation criterion, robustness analysis, which measures the minimum distortion distance of adversarial perturbation. By measuring the tolerance level for an adversarial attack, we can extract a set of features that provides the most robust support for a prediction, and also can extract a set of features that contrasts the current prediction to a target class by setting a targeted adversarial attack. By applying this methodology to various prediction tasks across multiple domains, we observe the derived explanations are indeed capturing the significant feature set qualitatively and quantitatively.

explanation, machine learning, natural language, (17 more...)

arXiv.org Machine Learning

May-31-2020

arXiv.org PDF

Add feedback

Country:
- Asia > Taiwan (0.04)
- North America > United States
  - Pennsylvania > Allegheny County > Pittsburgh (0.04)
- Europe > Italy
  - Marche > Ancona Province > Ancona (0.04)

Genre:
- Research Report (0.82)

Industry:
- Information Technology > Security & Privacy (0.55)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Natural Language (1.00)
  - Machine Learning
    - Neural Networks > Deep Learning (0.68)
    - Statistical Learning (0.68)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found