I-MAD: A Novel Interpretable Malware Detector Using Hierarchical Transformer

Li, Miles Q., Fung, Benjamin C. M., Charland, Philippe, Ding, Steven H. H.

Sep-26-2019–arXiv.org Machine Learning

Abstract--Malware imposes tremendous threats to computer users nowadays. Since signature-based malware detection methods are neither effective nor efficient to identify new malware, many machine learning-based methods have been proposed. A common disadvantage of existing machine learning methods is that they are not based on understanding the full semantic meaning of assembly code of an executable. They rather use short assembly code fragments, because assembly code is usually too long to be modelled in its entirety . Another disadvantage is that those methods have either inferior performance or bad interpretability . T o overcome these challenges, we propose an Interpretable MAware Detector ( I-MAD), which achieves state-of-the-art performance on static malware detection with excellent interpretability . It integrates a hierarchical T ransformer network that can understand assembly code at the basic block, function, and executable level. It also integrates our novel interpretable feed-forward neural network to provide interpretations for its detection results by pointing out the impact of each feature with respect to the prediction. Experiment results show that our model significantly outperforms previous state-of-the-art static malware detection models and presents meaningful interpretations. Since the Internet has become an integral part of people's life, the large volume of malware spreading on it imposes tremendous threats to billions of netizens. Recognizing mal-ware samples downloaded by legitimate users in a timely manner is thus of crucial importance for their protection. Signature-based malware detection methods are widely used in antivirus products [1]. With the signatures extracted by malware analysts, known malware samples or some of their variants can be precisely recognized. However, with obfuscation techniques or even a change of compiler, it is easy to create variants of known malware that perform the same attack, but with literally different executable code. As a result, the previously crafted signatures can no longer recognize them [2]. Furthermore, signature-based detection is also ineffective to detect new and unseen malware in most cases.

artificial intelligence, machine learning, vector, (17 more...)

arXiv.org Machine Learning

Sep-26-2019

arXiv.org PDF

Add feedback

Country:
- Africa > Middle East
  - Egypt > Western Desert (0.04)
- Asia > Vietnam
  - Long An Province (0.04)
- North America > Canada
  - Ontario > Kingston (0.04)
  - Quebec > Montreal (0.14)

Genre:
- Research Report > New Finding (0.35)

Industry:
- Information Technology > Security & Privacy (1.00)

Technology:
- Information Technology
  - Artificial Intelligence > Machine Learning
    - Neural Networks > Deep Learning (1.00)
  - Security & Privacy (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found