Goto

Collaborating Authors

 trojannet


TrojanNet – a simple yet effective attack on machine learning models

#artificialintelligence

Injecting malicious backdoors into deep neural networks is easier than previously thought, a new study by researchers at Texas A&M University shows. There's growing concern about the security implications of deep learning algorithms, which are becoming an integral part of applications across different sectors. Vulnerabilities in deep neural networks (DNN), the main technology behind deep learning, has become a growing area of interest in recent years. Trojan attacks are hidden triggers embedded in neural networks that can cause an AI model to act erratically at the whim of a malicious actor. For instance, an attacker can fool the image processor of a self-driving car into bypassing a stop sign or mistaking it for a speed limit sign.


Practical Detection of Trojan Neural Networks: Data-Limited and Data-Free Cases

arXiv.org Machine Learning

When the training data are maliciously tampered, the predictions of the acquired deep neural network (DNN) can be manipulated by an adversary known as the Trojan attack (or poisoning backdoor attack). The lack of robustness of DNNs against Trojan attacks could significantly harm real-life machine learning (ML) systems in downstream applications, therefore posing widespread concern to their trustworthiness. In this paper, we study the problem of the Trojan network (TrojanNet) detection in the data-scarce regime, where only the weights of a trained DNN are accessed by the detector. We first propose a data-limited TrojanNet detector (TND), when only a few data samples are available for TrojanNet detection. We show that an effective data-limited TND can be established by exploring connections between Trojan attack and prediction-evasion adversarial attacks including per-sample attack as well as all-sample universal attack. In addition, we propose a data-free TND, which can detect a TrojanNet without accessing any data samples. We show that such a TND can be built by leveraging the internal response of hidden neurons, which exhibits the Trojan behavior even at random noise inputs. The effectiveness of our proposals is evaluated by extensive experiments under different model architectures and datasets including CIFAR-10, GTSRB, and ImageNet.


TrojanNet: Embedding Hidden Trojan Horse Models in Neural Networks

arXiv.org Machine Learning

The complexity of large-scale neural networks can lead to poor understanding of their internal details. We show that this opaqueness provides an opportunity for adversaries to embed unintended functionalities into the network in the form of Trojan horses. Our novel framework hides the existence of a Trojan network with arbitrary desired functionality within a benign transport network. We prove theoretically that the Trojan network's detection is computationally infeasible and demonstrate empirically that the transport network does not compromise its disguise. Our paper exposes an important, previously unknown loophole that could potentially undermine the security and trustworthiness of machine learning.