Goto

Collaborating Authors

Can Machine Learning Model with Static Features be Fooled: an Adversarial Machine Learning Approach

arXiv.org Artificial Intelligence

Applied Intelligence manuscript No. (will be inserted by the editor) Abstract The widespread adoption of smartphones dramaticallygenerated by our attacks models when used to harden increases the risk of attacks and the spread the developed anti-malware system improves the detection of mobile malware, especially on the Android platform. Machine learning based solutions have been already Keywords Adversarial machine learning · malware used as a tool to supersede signature based anti-malware detection · poison attacks · adversarial example · systems. However, malware authors leverage attributes jacobian algorithm. Hence, to evaluate the vulnerability of machine 1 Introduction learning algorithms in malware detection, we propose five different attack scenarios to perturb malicious applications Nowadays using the Android application is very popular (apps). Every Android application inappropriately fits discriminant function on has a Jar-like APK format and is an archive file which the set of data points, eventually yielding a higher misclassification contains Android manifest and Classes.dex Further, to distinguish the adversarial manifest file holds information about the application examples from benign samples, we propose two defense structure and each part responsible for certain actions. To validate our For instance, the requested permissions must be accepted attacks and solutions, we test our model on three different by the users for successful installation of applications. We also test our methods The manifest file contains a list of hardware using various classifier algorithms and compare them components and permissions required by each application. Promising results show that generated the manifest file that are useful for running applications. Additionally, evasive variants is saved as the classes.dex In a nutshell, the by presenting some adversary-aware approaches?generated malware sample is statistically identical to a Do we require retraining of the current ML model to designbenign sample. To do so, adversaries adopt adversarial adversary-aware learning algorithms? How to properlymachine learning algorithms (AML) to design an example test and validate the countermeasure solutions inset called poison data which is used to fool machine a real-world network? The goal of this paper is to shedlearning models.


Similarity-based Android Malware Detection Using Hamming Distance of Static Binary Features

arXiv.org Machine Learning

In this paper, we develop four malware detection methods using Hamming distance to find similarity between samples which are first nearest neighbors (FNN), all nearest neighbors (ANN), weighted all nearest neighbors (WANN), and k-medoid based nearest neighbors (KMNN). In our proposed methods, we can trigger the alarm if we detect an Android app is malicious. Hence, our solutions help us to avoid the spread of detected malware on a broader scale. We provide a detailed description of the proposed detection methods and related algorithms. We include an extensive analysis to asses the suitability of our proposed similarity-based detection methods. In this way, we perform our experiments on three datasets, including benign and malware Android apps like Drebin, Contagio, and Genome. Thus, to corroborate the actual effectiveness of our classifier, we carry out performance comparisons with some state-of-the-art classification and malware detection algorithms, namely Mixed and Separated solutions, the program dissimilarity measure based on entropy (PDME) and the FalDroid algorithms. We test our experiments in a different type of features: API, intent, and permission features on these three datasets. The results confirm that accuracy rates of proposed algorithms are more than 90% and in some cases (i.e., considering API features) are more than 99%, and are comparable with existing state-of-the-art solutions.


Deep Partition Aggregation: Provable Defense against General Poisoning Attacks

arXiv.org Machine Learning

Adversarial poisoning attacks distort training data in order to corrupt the test-time behavior of a classifier. A provable defense provides a certificate for each test sample, which is a lower bound on the magnitude of any adversarial distortion of the training set that can corrupt the test sample's classification. We propose two provable defenses against poisoning attacks: (i) Deep Partition Aggregation (DPA), a certified defense against a general poisoning threat model, defined as the insertion or deletion of a bounded number of samples to the training set -- by implication, this threat model also includes arbitrary distortions to a bounded number of images and/or labels; and (ii) Semi-Supervised DPA (SS-DPA), a certified defense against label-flipping poisoning attacks. DPA is an ensemble method where base models are trained on partitions of the training set determined by a hash function. DPA is related to subset aggregation, a well-studied ensemble method in classical machine learning. DPA can also be viewed as an extension of randomized ablation (Levine & Feizi, 2020a), a certified defense against sparse evasion attacks, to the poisoning domain. Our label-flipping defense, SS-DPA, uses a semi-supervised learning algorithm as its base classifier model: we train each base classifier using the entire unlabeled training set in addition to the labels for a partition. SS-DPA outperforms the existing certified defense for label-flipping attacks (Rosenfeld et al., 2020). SS-DPA certifies >= 50% of test images against 675 label flips (vs. < 200 label flips with the existing defense) on MNIST and 83 label flips on CIFAR-10. Against general poisoning attacks (no prior certified defense), DPA certifies >= 50% of test images against > 500 poison image insertions on MNIST, and nine insertions on CIFAR-10. These results establish new state-of-the-art provable defenses against poison attacks.


Defending Distributed Classifiers Against Data Poisoning Attacks

arXiv.org Machine Learning

Support Vector Machines (SVMs) are vulnerable to targeted training data manipulations such as poisoning attacks and label flips. By carefully manipulating a subset of training samples, the attacker forces the learner to compute an incorrect decision boundary, thereby cause misclassifications. Considering the increased importance of SVMs in engineering and life-critical applications, we develop a novel defense algorithm that improves resistance against such attacks. Local Intrinsic Dimensionality (LID) is a promising metric that characterizes the outlierness of data samples. In this work, we introduce a new approximation of LID called K-LID that uses kernel distance in the LID calculation, which allows LID to be calculated in high dimensional transformed spaces. We introduce a weighted SVM against such attacks using K-LID as a distinguishing characteristic that de-emphasizes the effect of suspicious data samples on the SVM decision boundary. Each sample is weighted on how likely its K-LID value is from the benign K-LID distribution rather than the attacked K-LID distribution. We then demonstrate how the proposed defense can be applied to a distributed SVM framework through a case study on an SDR-based surveillance system. Experiments with benchmark data sets show that the proposed defense reduces classification error rates substantially (10% on average).


The Hammer and the Nut: Is Bilevel Optimization Really Needed to Poison Linear Classifiers?

arXiv.org Artificial Intelligence

One of the most concerning threats for modern AI systems is data poisoning, where the attacker injects maliciously crafted training data to corrupt the system's behavior at test time. Availability poisoning is a particularly worrisome subset of poisoning attacks where the attacker aims to cause a Denial-of-Service (DoS) attack. However, the state-of-the-art algorithms are computationally expensive because they try to solve a complex bi-level optimization problem (the "hammer"). We observed that in particular conditions, namely, where the target model is linear (the "nut"), the usage of computationally costly procedures can be avoided. We propose a counter-intuitive but efficient heuristic that allows contaminating the training set such that the target system's performance is highly compromised. We further suggest a re-parameterization trick to decrease the number of variables to be optimized. Finally, we demonstrate that, under the considered settings, our framework achieves comparable, or even better, performances in terms of the attacker's objective while being significantly more computationally efficient.