We consider the problem of detecting malware with deep learning models, where the malware may be combined with significant amounts of benign code. Examples of this include piggybacking and trojan horse attacks on a system, where malicious behavior is hidden within a useful application. Such added flexibility in augmenting the malware enables significantly more code obfuscation. Hence we focus on the use of static features, particularly Intents, Permissions, and API calls, which we presume cannot be ultimately hidden from the Android system, but only augmented with yet more such features. We first train a deep neural network classifier for malware classification using features of benign and malware samples. Then we demonstrate a steep increase in false negative rate (i.e., attacks succeed), simply by randomly adding features of a benign app to malware. Finally we test the use of data augmentation to harden the classifier against such attacks. We find that for API calls, it is possible to reject the vast majority of attacks, where using Intents or Permissions is less successful.
Pokémon Go players are packing America's sidewalks, trying desperately to catch'em all (Pokémon for newbies, that is) by exploring their neighborhoods with the augmented-reality app. But there may be one thing lurking in their games that they really don't want to catch: malware. It turns out that even hackers are going crazy about Pokémon Go, according to cybersecurity researchers -- and there's a fishy Android version of the game that would make the villainous Team Rocket proud. The fake version is almost impossible to distinguish from the actual game when you start it up. But it secretly includes a kind of malware called DroidJack that can basically take over your device and steal information from it, the cybersecurity firm Proofpoint reported in a recent blog post.
Android malware is one of the most dangerous threats on the internet, and it's been on the rise for several years. Despite significant efforts in detecting and classifying android malware from innocuous android applications, there is still a long way to go. As a result, there is a need to provide a basic understanding of the behavior displayed by the most common Android malware categories and families. Each Android malware family and category has a distinct objective. As a result, it has impacted every corporate area, including healthcare, banking, transportation, government, and e-commerce. In this paper, we presented two machine-learning approaches for Dynamic Analysis of Android Malware: one for detecting and identifying Android Malware Categories and the other for detecting and identifying Android Malware Families, which was accomplished by analyzing a massive malware dataset with 14 prominent malware categories and 180 prominent malware families of CCCS-CIC-AndMal2020 dataset on Dynamic Layers. Our approach achieves in Android Malware Category detection more than 96 % accurate and achieves in Android Malware Family detection more than 99% accurate. Our approach provides a method for high-accuracy Dynamic Analysis of Android Malware while also shortening the time required to analyze smartphone malware.
In this paper, we present a comparative analysis of benign and malicious Android applications, based on static features. In particular, we focus our attention on the permissions requested by an application. We consider both binary classification of malware versus benign, as well as the multiclass problem, where we classify malware samples into their respective families. Our experiments are based on substantial malware datasets and we employ a wide variety of machine learning techniques, including decision trees and random forests, support vector machines, logistic model trees, AdaBoost, and artificial neural networks. We find that permissions are a strong feature and that by careful feature engineering, we can significantly reduce the number of features needed for highly accurate detection and classification.
Machine-learning models have been recently used for detecting malicious Android applications, reporting impressive performances on benchmark datasets, even when trained only on features statically extracted from the application, such as system calls and permissions. However, recent findings have highlighted the fragility of such in-vitro evaluations with benchmark datasets, showing that very few changes to the content of Android malware may suffice to evade detection. How can we thus trust that a malware detector performing well on benchmark data will continue to do so when deployed in an operating environment? To mitigate this issue, the most popular Android malware detectors use linear, explainable machine-learning models to easily identify the most influential features contributing to each decision. In this work, we generalize this approach to any black-box machine- learning model, by leveraging a gradient-based approach to identify the most influential local features. This enables using nonlinear models to potentially increase accuracy without sacrificing interpretability of decisions. Our approach also highlights the global characteristics learned by the model to discriminate between benign and malware applications. Finally, as shown by our empirical analysis on a popular Android malware detection task, it also helps identifying potential vulnerabilities of linear and nonlinear models against adversarial manipulations.