Goto

Collaborating Authors

 Al-Dujaili, Abdullah


Min-Max Optimization without Gradients: Convergence and Applications to Adversarial ML

arXiv.org Machine Learning

In this paper, we study the problem of constrained robust (min-max) optimization ina black-box setting, where the desired optimizer cannot access the gradients of the objective function but may query its values. We present a principled optimization framework, integrating a zeroth-order (ZO) gradient estimator with an alternating projected stochastic gradient descent-ascent method, where the former only requires a small number of function queries and the later needs just one-step descent/ascent update. We show that the proposed framework, referred to as ZO-Min-Max, has a sub-linear convergence rate under mild conditions and scales gracefully with problem size. From an application side, we explore a promising connection between black-box min-max optimization and black-box evasion and poisoning attacks in adversarial machine learning (ML). Our empirical evaluations on these use cases demonstrate the effectiveness of our approach and its scalability to dimensions that prohibit using recent black-box solvers.


There are No Bit Parts for Sign Bits in Black-Box Attacks

arXiv.org Machine Learning

Machine learning models are vulnerable to adversarial examples. In this paper, we are concerned with black-box adversarial attacks, where only loss-oracle access to a model is available. At the heart of black-box adversarial attack is the gradient estimation problem with query complexity O(n), where n is the number of data features. Recent work has developed query-efficient gradient estimation schemes by exploiting data- and/or time-dependent priors. Practically, sign-based optimization has shown to be effective in both training deep nets as well as attacking them in a white-box setting. Therefore, instead of a gradient estimation view of black-box adversarial attacks, we view the black-box adversarial attack problem as estimating the gradient's sign bits. This shifts the view from continuous to binary black-box optimization and theoretically guarantees a lower query complexity of $\Omega(n/ \log_2(n+1))$ when given access to a Hamming loss oracle. We present three algorithms to estimate the gradient sign bits given a limited number of queries to the loss oracle. Using one of our proposed algorithms to craft black-box adversarial examples, we demonstrate evasion rate experiments on standard models trained on the MNIST, CIFAR10, and IMAGENET datasets that set new state-of-the-art results for query-efficient black-box attacks. Averaged over all the datasets and metrics, our attack fails $3.8\times$ less often and spends in total $2.5\times$ fewer queries than the current state-of-the-art attacks combined given a budget of 10,000 queries per attack attempt. On a public MNIST black-box attack challenge, our attack achieves the highest evasion rate surpassing all of the submitted attacks. Notably, our attack is hyperparameter-free (no hyperparameter tuning) and does not employ any data-/time-dependent prior, the latter fact suggesting that the number of queries can further be reduced.


Multivariate Time-series Similarity Assessment via Unsupervised Representation Learning and Stratified Locality Sensitive Hashing: Application to Early Acute Hypotensive Episode Detection

arXiv.org Artificial Intelligence

Timely prediction of clinically critical events in Intensive Care Unit (ICU) is important for improving care and survival rate. Most of the existing approaches are based on the application of various classification methods on explicitly extracted statistical features from vital signals. In this work, we propose to eliminate the high cost of engineering hand-crafted features from multivariate time-series of physiologic signals by learning their representation with a sequence-to-sequence auto-encoder. We then propose to hash the learned representations to enable signal similarity assessment for the prediction of critical events. We apply this methodological framework to predict Acute Hypotensive Episodes (AHE) on a large and diverse dataset of vital signal recordings. Experiments demonstrate the ability of the presented framework in accurately predicting an upcoming AHE.


AST-Based Deep Learning for Detecting Malicious PowerShell

arXiv.org Machine Learning

With the celebrated success of deep learning, some attempts to develop effective methods for detecting malicious PowerShell programs employ neural nets in a traditional natural language processing setup while others employ convolutional neural nets to detect obfuscated malicious commands at a character level. While these representations may express salient PowerShell properties, our hypothesis is that tools from static program analysis will be more effective. We propose a hybrid approach combining traditional program analysis (in the form of abstract syntax trees) and deep learning. This poster presents preliminary results of a fundamental step in our approach: learning embeddings for nodes of PowerShell ASTs. We classify malicious scripts by family type and explore embedded program vector representations.


On Visual Hallmarks of Robustness to Adversarial Malware

arXiv.org Machine Learning

A central challenge of adversarial learning is to interpret the resulting hardened model. In this contribution, we ask how robust generalization can be visually discerned and whether a concise view of the interactions between a hardened decision map and input samples is possible. We first provide a means of visually comparing a hardened model's loss behavior with respect to the adversarial variants generated during training versus loss behavior with respect to adversarial variants generated from other sources. This allows us to confirm that the association of observed flatness of a loss landscape with generalization that is seen with naturally trained models extends to adversarially hardened models and robust generalization. To complement these means of interpreting model parameter robustness we also use self-organizing maps to provide a visual means of superimposing adversarial and natural variants on a model's decision space, thus allowing the model's global robustness to be comprehensively examined.


Adversarial Deep Learning for Robust Detection of Binary Encoded Malware

arXiv.org Machine Learning

Malware is constantly adapting in order to avoid detection. Model based malware detectors, such as SVM and neural networks, are vulnerable to so-called adversarial examples which are modest changes to detectable malware that allows the resulting malware to evade detection. Continuous-valued methods that are robust to adversarial examples of images have been developed using saddle-point optimization formulations. We are inspired by them to develop similar methods for the discrete, e.g. binary, domain which characterizes the features of malware. A specific extra challenge of malware is that the adversarial examples must be generated in a way that preserves their malicious functionality. We introduce methods capable of generating functionally preserved adversarial malware examples in the binary domain. Using the saddle-point formulation, we incorporate the adversarial examples into the training of models that are robust to them. We evaluate the effectiveness of the methods and others in the literature on a set of Portable Execution~(PE) files. Comparison prompts our introduction of an online measure computed during training to assess general expectation of robustness.


Embedded Bandits for Large-Scale Black-Box Optimization

AAAI Conferences

Random embedding has been applied with empirical success to large-scale black-box optimization problems with low effective dimensions. This paper proposes the EmbeddedHunter algorithm, which incorporates the technique in a hierarchical stochastic bandit setting, following the optimism in the face of uncertainty principle and breaking away from the multiple-run framework in which random embedding has been conventionally applied similar to stochastic black-box optimization solvers. Our proposition is motivated by the bounded mean variation in the objective value for a low-dimensional point projected randomly into the decision space of Lipschitz-continuous problems. In essence, the EmbeddedHunter algorithm expands optimistically a partitioning tree over a low-dimensional — equal to the effective dimension of the problem —search space based on a bounded number of random embeddings of sampled points from the low-dimensional space. In contrast to the probabilistic theoretical guarantees of multiple-run random-embedding algorithms, the finite-time analysis of the proposed algorithm presents a theoretical upper bound on the regret as a function of the algorithm's number of iterations. Furthermore, numerical experiments were conducted to validate its performance. The results show a clear performance gain over recently proposed random embedding methods for large-scale problems, provided the intrinsic dimensionality is low.