Adversarial Detection with Model Interpretation
Machine learning (ML) systems have been increasingly applied in web security applications such as spammer detection, malware detection and fraud detection. These applications have an intrinsic adversarial nature where intelligent attackers can adaptively change their behaviors to avoid being detected by the deployed detectors. Existing efforts against adversaries are usually limited by the type of applied ML models or the specific applications such as image classification. Additionally, the working mechanisms of ML models usually cannot be well understood by users, which in turn impede them from understanding the vulnerabilities of models nor improving their robustness. To bridge the gap, in this paper, we propose to investigate whether model interpretation could potentially help adversarial detection.
Nov-24-2018, 00:25:02 GMT