Goto

Collaborating Authors

 defense technique


Supplementary Material for Understanding and Improving Ensemble Adversarial Defense

Neural Information Processing Systems

They are used to test the proposed enhancement approach iGA T. In general, ADP employs an ensemble by averaging, i.e., (C 1) ( C 1) Adversarial examples are generated to compute the losses by using the PGD attack. Our main theorem builds on a supporting Lemma 2.1. We start from the cross-entropy loss curvature measured by Eq. The above new expression of T (x) helps bound the difference between h(x) and h(x). Note that these three cases are mutually exclusive.






Due to the time constraints of the rebuttal, we limited

Neural Information Processing Systems

We cannot thank the reviewers enough for their valuable feedback on our work. Reviewers 1 and 2: Combine guess loss with additive noise. Most recent advances in adversarial defense methods address "black-box attacks" performed by a The latter incorporates adversarial examples during training to increase the model's robustness to the attack. Therefore the reconstructed image can serve as an adversarially perturbed example of the non-adversarial input image. Reviewer 3: Novelty is not enough as most of the proposed solution or observations are already published.


Towards Stable and Robust AdderNets (Supplementary Material) Minjing Dong 1,2, Yunhe Wang

Neural Information Processing Systems

As shown in Table 1, the stability of adversarial robustness is evaluated under different inference settings. We first show whether the shuffle of test set influences the performance. In the main body, we mainly focus on the comparison with CNN since we want to highlight the natural robustness of AdderNet compared to CNN under the same setting. We further evaluate the performance of A WN on CNNs. As shown in Table 3, with the involvement of A WN, CNN obtains slight better adversarial robustness.


A Comprehensive Study of Jailbreak Attack versus Defense for Large Language Models

Xu, Zihao, Liu, Yi, Deng, Gelei, Li, Yuekang, Picek, Stjepan

arXiv.org Artificial Intelligence

Large Language Models (LLMS) have increasingly become central to generating content with potential societal impacts. Notably, these models have demonstrated capabilities for generating content that could be deemed harmful. To mitigate these risks, researchers have adopted safety training techniques to align model outputs with societal values to curb the generation of malicious content. However, the phenomenon of "jailbreaking", where carefully crafted prompts elicit harmful responses from models, persists as a significant challenge. This research conducts a comprehensive analysis of existing studies on jailbreaking LLMs and their defense techniques. We meticulously investigate nine attack techniques and seven defense techniques applied across three distinct language models: Vicuna, LLama, and GPT-3.5 Turbo. We aim to evaluate the effectiveness of these attack and defense techniques. Our findings reveal that existing white-box attacks underperform compared to universal techniques and that including special tokens in the input significantly affects the likelihood of successful attacks. This research highlights the need to concentrate on the security facets of LLMs. Additionally, we contribute to the field by releasing our datasets and testing framework, aiming to foster further research into LLM security. We believe these contributions will facilitate the exploration of security measures within this domain.


Understanding and Improving Ensemble Adversarial Defense

Deng, Yian, Mu, Tingting

arXiv.org Artificial Intelligence

The strategy of ensemble has become popular in adversarial defense, which trains multiple base classifiers to defend against adversarial attacks in a cooperative manner. Despite the empirical success, theoretical explanations on why an ensemble of adversarially trained classifiers is more robust than single ones remain unclear. To fill in this gap, we develop a new error theory dedicated to understanding ensemble adversarial defense, demonstrating a provable 0-1 loss reduction on challenging sample sets in an adversarial defense scenario. Guided by this theory, we propose an effective approach to improve ensemble adversarial defense, named interactive global adversarial training (iGAT). The proposal includes (1) a probabilistic distributing rule that selectively allocates to different base classifiers adversarial examples that are globally challenging to the ensemble, and (2) a regularization term to rescue the severest weaknesses of the base classifiers. Being tested over various existing ensemble adversarial defense techniques, iGAT is capable of boosting their performance by increases up to 17% evaluated using CIFAR10 and CIFAR100 datasets under both white-box and black-box attacks.


Adversarial attacks and defenses on ML- and hardware-based IoT device fingerprinting and identification

Sánchez, Pedro Miguel Sánchez, Celdrán, Alberto Huertas, Bovet, Gérôme, Pérez, Gregorio Martínez

arXiv.org Artificial Intelligence

In the last years, the number of IoT devices deployed has suffered an undoubted explosion, reaching the scale of billions. However, some new cybersecurity issues have appeared together with this development. Some of these issues are the deployment of unauthorized devices, malicious code modification, malware deployment, or vulnerability exploitation. This fact has motivated the requirement for new device identification mechanisms based on behavior monitoring. Besides, these solutions have recently leveraged Machine and Deep Learning techniques due to the advances in this field and the increase in processing capabilities. In contrast, attackers do not stay stalled and have developed adversarial attacks focused on context modification and ML/DL evaluation evasion applied to IoT device identification solutions. This work explores the performance of hardware behavior-based individual device identification, how it is affected by possible context- and ML/DL-focused attacks, and how its resilience can be improved using defense techniques. In this sense, it proposes an LSTM-CNN architecture based on hardware performance behavior for individual device identification. Then, previous techniques have been compared with the proposed architecture using a hardware performance dataset collected from 45 Raspberry Pi devices running identical software. The LSTM-CNN improves previous solutions achieving a +0.96 average F1-Score and 0.8 minimum TPR for all devices. Afterward, context- and ML/DL-focused adversarial attacks were applied against the previous model to test its robustness. A temperature-based context attack was not able to disrupt the identification. However, some ML/DL state-of-the-art evasion attacks were successful. Finally, adversarial training and model distillation defense techniques are selected to improve the model resilience to evasion attacks, without degrading its performance.