Collaborating Authors


Open-source software: How many bugs are hidden there on purpose?


Microsoft-owned GitHub, the world's largest platform for open-source software, has found that 17% of all vulnerabilities in software were planted for malicious purposes. GitHub reported that almost a fifth of all software bugs were intentionally placed in code by malicious actors in its 2020 Octoverse report, released yesterday. Proprietary software makers over the years have been regularly criticized for'security through obscurity' or not making source code available for review by experts outside the company. Open source, on the other hand, is seen as a more transparent manner of development because, in theory, it can be vetted by anyone. But the reality is that it's often not vetted due to a lack of funding and human resource constraints.

Why Increasing Instances Of Adversarial Attacks Are Concerning?


Artificial intelligence (AI) is a technology that mimics human intelligence and computational skills. Today, AI tools are employed to find efficiencies, improve decision making, and offer better end-user experiences. It is also used to fight and prevent cybersecurity threats too. AI and its subtype machine learning (ML) is used by companies to analyze network traffic for anomalous and suspicious activity. However, there are certain limitations when it comes to applying these tools to security.

13th ACM Workshop on Artificial Intelligence and Security (AISec 2020)


A backdoor is a covert functionality in a machine learning model that causes it to produce incorrect outputs on inputs with a certain "trigger" feature. Recent research on data-poisoning and trojaning attacks has shown how backdoors can be introduced into ML models -- but only for backdoors that act as universal adversarial perturbations (UAPs) and in an inferior threat model that requires the attacker to poison the model and then modify the input at inference time. I will describe a new technique for backdooring ML models based on poisoning the loss-value computation, and demonstrate that it can introduce new types of backdoors which are different and more powerful than UAPs, including (1) single-pixel and physically realizable backdoors in ImageNet, (2) backdoors that switch the model to an entirely different, privacy-violating functionality, e.g., cause a model that counts the number of faces in a photo to covertly recognize specific individuals; and (3) semantic backdoors that do not require the attacker to modify the input at inference time. Oh, and they evade all known defenses, too.

Hitting the Books: The latest 'Little Brother' is a stark cybersecurity thriller


Back in 2008, New York Times best-selling author and Boing Boing alum, Cory Doctorow introduced Markus "w1n5t0n" Yallow to the world in the original Little Brother (which you can still read for free right here). The story follows the talented teenage computer prodigy's exploits after he and his friends find themselves caught in the aftermath of a terrorist bombing of the Bay Bridge. They must outwit and out-hack the DHS, which has turned San Francisco into a police state. Its sequel, Homeland, catches up with Yallow a few years down the line as he faces an impossible choice between behaving as the heroic hacker his friends see him as and toeing the company line. The third installment, Attack Surface, is a standalone story set in the Little Brother universe. It follows Yallow's archrival, Masha Maximow, an equally talented hacker who finds herself working as a counterterrorism expert for a multinational security firm. By day, she enables tin-pot dictators around the world to repress and surveil their citizens.

Open-sourced Dataset Protection via Backdoor Watermarking Artificial Intelligence

The rapid development of deep learning has benefited from the release of some high-quality open-sourced datasets ($e.g.$, ImageNet), which allows researchers to easily verify the effectiveness of their algorithms. Almost all existing open-sourced datasets require that they can only be adopted for academic or educational purposes rather than commercial purposes, whereas there is still no good way to protect them. In this paper, we propose a backdoor embedding based dataset watermarking method to protect an open-sourced image-classification dataset by verifying whether it is used for training a third-party model. Specifically, the proposed method contains two main processes, including dataset watermarking and dataset verification. We adopt classical poisoning-based backdoor attacks ($e.g.$, BadNets) for dataset watermarking, $i.e.$, generating some poisoned samples by adding a certain trigger ($e.g.$, a local patch) onto some benign samples, labeled with a pre-defined target class. Based on the proposed backdoor-based watermarking, we use a hypothesis test guided method for dataset verification based on the posterior probability generated by the suspicious third-party model of the benign samples and their correspondingly watermarked samples ($i.e.$, images with trigger) on the target class. Experiments on some benchmark datasets are conducted, which verify the effectiveness of the proposed method.

What is machine learning data poisoning?


This article is part of Demystifying AI, a series of posts that (try to) disambiguate the jargon and myths surrounding AI. It's not hard to tell that the image below shows three different things: a bird, a dog, and a horse. This example portrays one of the dangerous characteristics of machine learning models, which can be exploited to force them into misclassifying data. This is an example of data poisoning, a special type of adversarial attack, a series of techniques that target the behavior of machine learning and deep learning models. If applied successfully, data poisoning can provide malicious actors backdoor access to machine learning models and enable them to bypass systems controlled by artificial intelligence algorithms.

Can Adversarial Weight Perturbations Inject Neural Backdoors? Machine Learning

Adversarial machine learning has exposed several security hazards of neural models and has become an important research topic in recent times. Thus far, the concept of an "adversarial perturbation" has exclusively been used with reference to the input space referring to a small, imperceptible change which can cause a ML model to err. In this work we extend the idea of "adversarial perturbations" to the space of model weights, specifically to inject backdoors in trained DNNs, which exposes a security risk of using publicly available trained models. Here, injecting a backdoor refers to obtaining a desired outcome from the model when a trigger pattern is added to the input, while retaining the original model predictions on a non-triggered input. From the perspective of an adversary, we characterize these adversarial perturbations to be constrained within an $\ell_{\infty}$ norm around the original model weights. We introduce adversarial perturbations in the model weights using a composite loss on the predictions of the original model and the desired trigger through projected gradient descent. We empirically show that these adversarial weight perturbations exist universally across several computer vision and natural language processing tasks. Our results show that backdoors can be successfully injected with a very small average relative change in model weight values for several applications.

Noise-response Analysis for Rapid Detection of Backdoors in Deep Neural Networks Machine Learning

The pervasiveness of deep neural networks (DNNs) in technology, matched with the ubiquity of cloud-based training and transfer learning, is giving rise to a new frontier for cybersecurity whereby `structural malware' is manifest as compromised weights and activation pathways for unsecure DNNs. In particular, DNNs can be designed to have backdoors in which an adversary can easily and reliably fool a classifier by adding to any image a pattern of pixels called a trigger. Since DNNs are black-box algorithms, it is generally difficult to detect a backdoor or any other type of structural malware. To efficiently provide a reliable signal for the absence/presence of backdoors, we propose a rapid feature-generation step in which we study how DNNs respond to noise-infused images with varying noise intensity. This results in titration curves, which are a type of `fingerprinting' for DNNs. We find that DNNs with backdoors are more sensitive to input noise and respond in a characteristic way that reveals the backdoor and where it leads (i.e,. its target). Our empirical results demonstrate that we can accurately detect a backdoor with high confidence orders-of-magnitude faster than existing approaches (i.e., seconds versus hours). Our method also yields a titration-score that can automate the detection of compromised DNNs, whereas existing backdoor-detection strategies are not automated.

Attack of the Tails: Yes, You Really Can Backdoor Federated Learning Machine Learning

Due to its decentralized nature, Federated Learning (FL) lends itself to adversarial attacks in the form of backdoors during training. The goal of a backdoor is to corrupt the performance of the trained model on specific sub-tasks (e.g., by classifying green cars as frogs). A range of FL backdoor attacks have been introduced in the literature, but also methods to defend against them, and it is currently an open question whether FL systems can be tailored to be robust against backdoors. In this work, we provide evidence to the contrary. We first establish that, in the general case, robustness to backdoors implies model robustness to adversarial examples, a major open problem in itself. Furthermore, detecting the presence of a backdoor in a FL model is unlikely assuming first order oracles or polynomial time. We couple our theoretical results with a new family of backdoor attacks, which we refer to as edge-case backdoors. An edge-case backdoor forces a model to misclassify on seemingly easy inputs that are however unlikely to be part of the training, or test data, i.e., they live on the tail of the input distribution. We explain how these edge-case backdoors can lead to unsavory failures and may have serious repercussions on fairness, and exhibit that with careful tuning at the side of the adversary, one can insert them across a range of machine learning tasks (e.g., image classification, OCR, text prediction, sentiment analysis).

A new measure for overfitting and its implications for backdooring of deep learning Machine Learning

Overfitting describes the phenomenon that a machine learning model fits the given data instead of learning the underlying distribution. Existing approaches are computationally expensive, require large amounts of labeled data, consider overfitting global phenomenon, and often compute a single measurement. Instead, we propose a local measurement around a small number of unlabeled test points to obtain features of overfitting. Our extensive evaluation shows that the measure can reflect the model's different fit of training and test data, identify changes of the fit during training, and even suggest different fit among classes. We further apply our method to verify if backdoors rely on overfitting, a common claim in security of deep learning. Instead, we find that backdoors rely on underfitting. Our findings also provide evidence that even unbackdoored neural networks contain patterns similar to backdoors that are reliably classified as one class.