Lin, Chenhao
Revisiting Training-Inference Trigger Intensity in Backdoor Attacks
Lin, Chenhao, Zhao, Chenyang, Wang, Shiwei, Wang, Longtian, Shen, Chao, Zhao, Zhengyu
Backdoor attacks typically place a specific trigger on certain training data, such that the model makes prediction errors on inputs with that trigger during inference. Despite the core role of the trigger, existing studies have commonly believed a perfect match between training-inference triggers is optimal. In this paper, for the first time, we systematically explore the training-inference trigger relation, particularly focusing on their mismatch, based on a Training-Inference Trigger Intensity Manipulation (TITIM) workflow. TITIM specifically investigates the training-inference trigger intensity, such as the size or the opacity of a trigger, and reveals new insights into trigger generalization and overfitting. These new insights challenge the above common belief by demonstrating that the training-inference trigger mismatch can facilitate attacks in two practical scenarios, posing more significant security threats than previously thought. First, when the inference trigger is fixed, using training triggers with mixed intensities leads to stronger attacks than using any single intensity. For example, on CIFAR-10 with ResNet-18, mixing training triggers with 1.0 and 0.1 opacities improves the worst-case attack success rate (ASR) (over different testing opacities) of the best single-opacity attack from 10.61\% to 92.77\%. Second, intentionally using certain mismatched training-inference triggers can improve the attack stealthiness, i.e., better bypassing defenses. For example, compared to the training/inference intensity of 1.0/1.0, using 1.0/0.7 decreases the area under the curve (AUC) of the Scale-Up defense from 0.96 to 0.62, while maintaining a high attack ASR (99.65\% vs. 91.62\%). The above new insights are validated to be generalizable across different backdoor attacks, models, datasets, tasks, and (digital/physical) domains.
Improving Integrated Gradient-based Transferable Adversarial Examples by Refining the Integration Path
Ren, Yuchen, Zhao, Zhengyu, Lin, Chenhao, Yang, Bo, Zhou, Lu, Liu, Zhe, Shen, Chao
Transferable adversarial examples are known to cause threats in practical, black-box attack scenarios. A notable approach to improving transferability is using integrated gradients (IG), originally developed for model interpretability. In this paper, we find that existing IG-based attacks have limited transferability due to their naive adoption of IG in model interpretability. To address this limitation, we focus on the IG integration path and refine it in three aspects: multiplicity, monotonicity, and diversity, supported by theoretical analyses. We propose the Multiple Monotonic Diversified Integrated Gradients (MuMoDIG) attack, which can generate highly transferable adversarial examples on different CNN and ViT models and defenses. Experiments validate that MuMoDIG outperforms the latest IG-based attack by up to 37.3\% and other state-of-the-art attacks by 8.4\%. In general, our study reveals that migrating established techniques to improve transferability may require non-trivial efforts. Code is available at \url{https://github.com/RYC-98/MuMoDIG}.
Speech-Forensics: Towards Comprehensive Synthetic Speech Dataset Establishment and Analysis
Ji, Zhoulin, Lin, Chenhao, Wang, Hang, Shen, Chao
Detecting synthetic from real speech is increasingly crucial due to the risks of misinformation and identity impersonation. While various datasets for synthetic speech analysis have been developed, they often focus on specific areas, limiting their utility for comprehensive research. To fill this gap, we propose the Speech-Forensics dataset by extensively covering authentic, synthetic, and partially forged speech samples that include multiple segments synthesized by different high-quality algorithms. Moreover, we propose a TEmporal Speech LocalizaTion network, called TEST, aiming at simultaneously performing authenticity detection, multiple fake segments localization, and synthesis algorithms recognition, without any complex post-processing. TEST effectively integrates LSTM and Transformer to extract more powerful temporal speech representations and utilizes dense prediction on multi-scale pyramid features to estimate the synthetic spans. Our model achieves an average mAP of 83.55% and an EER of 5.25% at the utterance level. At the segment level, it attains an EER of 1.07% and a 92.19% F1 score. These results highlight the model's robust capability for a comprehensive analysis of synthetic speech, offering a promising avenue for future research and practical applications in this field.
Can Targeted Clean-Label Poisoning Attacks Generalize?
Chen, Zhizhen, Dutta, Subrat Kishore, Zhao, Zhengyu, Lin, Chenhao, Shen, Chao, Zhang, Xiao
Targeted poisoning attacks aim to compromise the model's prediction on specific target samples. In a common clean-label setting, they are achieved by slightly perturbing a subset of training samples given access to those specific targets. Despite continuous efforts, it remains unexplored whether such attacks can generalize to unknown variations of those targets. In this paper, we take the first step to systematically study this generalization problem. Observing that the widely adopted, cosine similarity-based attack exhibits limited generalizability, we propose a well-generalizable attack that leverages both the direction and magnitude of model gradients. In particular, we explore diverse target variations, such as an object with varied viewpoints and an animal species with distinct appearances. Extensive experiments across various generalization scenarios demonstrate that our method consistently achieves the best attack effectiveness. For example, our method outperforms the cosine similarity-based attack by 20.95% in attack success rate with similar overall accuracy, averaged over four models on two image benchmark datasets. The code is available at https://github.com/jiaangk/generalizable_tcpa
A Survey of Deep Learning Library Testing Methods
Zhang, Xiaoyu, Jiang, Weipeng, Shen, Chao, Li, Qi, Wang, Qian, Lin, Chenhao, Guan, Xiaohong
In recent years, software systems powered by deep learning (DL) techniques have significantly facilitated people's lives in many aspects. As the backbone of these DL systems, various DL libraries undertake the underlying optimization and computation. However, like traditional software, DL libraries are not immune to bugs, which can pose serious threats to users' personal property and safety. Studying the characteristics of DL libraries, their associated bugs, and the corresponding testing methods is crucial for enhancing the security of DL systems and advancing the widespread application of DL technology. This paper provides an overview of the testing research related to various DL libraries, discusses the strengths and weaknesses of existing methods, and provides guidance and reference for the application of the DL library. This paper first introduces the workflow of DL underlying libraries and the characteristics of three kinds of DL libraries involved, namely DL framework, DL compiler, and DL hardware library. It then provides definitions for DL underlying library bugs and testing. Additionally, this paper summarizes the existing testing methods and tools tailored to these DL libraries separately and analyzes their effectiveness and limitations. It also discusses the existing challenges of DL library testing and outlines potential directions for future research.
Towards Benchmarking and Evaluating Deepfake Detection
Lin, Chenhao, Deng, Jingyi, Hu, Pengbin, Shen, Chao, Wang, Qian, Li, Qi
Deepfake detection automatically recognizes the manipulated medias through the analysis of the difference between manipulated and non-altered videos. It is natural to ask which are the top performers among the existing deepfake detection approaches to identify promising research directions and provide practical guidance. Unfortunately, it's difficult to conduct a sound benchmarking comparison of existing detection approaches using the results in the literature because evaluation conditions are inconsistent across studies. Our objective is to establish a comprehensive and consistent benchmark, to develop a repeatable evaluation procedure, and to measure the performance of a range of detection approaches so that the results can be compared soundly. A challenging dataset consisting of the manipulated samples generated by more than 13 different methods has been collected, and 11 popular detection approaches (9 algorithms) from the existing literature have been implemented and evaluated with 6 fair-minded and practical evaluation metrics. Finally, 92 models have been trained and 644 experiments have been performed for the evaluation. The results along with the shared data and evaluation methodology constitute a benchmark for comparing deepfake detection approaches and measuring progress.
SSL-Auth: An Authentication Framework by Fragile Watermarking for Pre-trained Encoders in Self-supervised Learning
Li, Xiaobei, Yin, Changchun, Zhu, Liyue, Xu, Xiaogang, Fang, Liming, Wang, Run, Lin, Chenhao
Self-supervised learning (SSL), a paradigm harnessing unlabeled datasets to train robust encoders, has recently witnessed substantial success. These encoders serve as pivotal feature extractors for downstream tasks, demanding significant computational resources. Nevertheless, recent studies have shed light on vulnerabilities in pre-trained encoders, including backdoor and adversarial threats. Safeguarding the intellectual property of encoder trainers and ensuring the trustworthiness of deployed encoders pose notable challenges in SSL. To bridge these gaps, we introduce SSL-Auth, the first authentication framework designed explicitly for pre-trained encoders. SSL-Auth leverages selected key samples and employs a well-trained generative network to reconstruct watermark information, thus affirming the integrity of the encoder without compromising its performance. By comparing the reconstruction outcomes of the key samples, we can identify any malicious alterations. Comprehensive evaluations conducted on a range of encoders and diverse downstream tasks demonstrate the effectiveness of our proposed SSL-Auth.
Towards Deep Learning Models Resistant to Transfer-based Adversarial Attacks via Data-centric Robust Learning
Yang, Yulong, Lin, Chenhao, Ji, Xiang, Tian, Qiwei, Li, Qian, Yang, Hongshan, Wang, Zhibo, Shen, Chao
Transfer-based adversarial attacks raise a severe threat to real-world deep learning systems since they do not require access to target models. Adversarial training (AT), which is recognized as the strongest defense against white-box attacks, has also guaranteed high robustness to (black-box) transfer-based attacks. However, AT suffers from heavy computational overhead since it optimizes the adversarial examples during the whole training process. In this paper, we demonstrate that such heavy optimization is not necessary for AT against transfer-based attacks. Instead, a one-shot adversarial augmentation prior to training is sufficient, and we name this new defense paradigm Data-centric Robust Learning (DRL). Our experimental results show that DRL outperforms widely-used AT techniques (e.g., PGD-AT, TRADES, EAT, and FAT) in terms of black-box robustness and even surpasses the top-1 defense on RobustBench when combined with diverse data augmentations and loss regularizations. We also identify other benefits of DRL, for instance, the model generalization capability and robust fairness.
Hard Adversarial Example Mining for Improving Robust Fairness
Lin, Chenhao, Ji, Xiang, Yang, Yulong, Li, Qian, Shen, Chao, Wang, Run, Fang, Liming
Adversarial training (AT) is widely considered the stateof-the-art Various approaches have been proposed to enhance the technique for improving the robustness of deep defense capabilities of DNNs against AEs. Adversarial neural networks (DNNs) against adversarial examples training (AT) has been demonstrated to be one of the (AE). Nevertheless, recent studies have revealed that adversarially most effective strategies [11]. Nevertheless, recent research trained models are prone to unfairness problems, [26, 23] have observed that the adversarially trained models restricting their applicability. In this paper, we empirically usually suffer from a serious unfairness problem, i.e., observe that this limitation may be attributed to serious adversarial there is a noticeable disparity in accuracy between different confidence overfitting, i.e., certain adversarial examples classes, seriously restricting their applicability in real-world with overconfidence. To alleviate this problem, we scenarios. Although some solutions have been proposed, propose HAM, a straightforward yet effective framework via the average robustness fairness score is still low and needs adaptive Hard Adversarial example Mining. HAM concentrates to be urgently addressed. On the other hand, several recent on mining hard adversarial examples while discarding studies [29, 17, 25] have focused on achieving efficient adversarial the easy ones in an adaptive fashion.
End-to-end Face-swapping via Adaptive Latent Representation Learning
Lin, Chenhao, Hu, Pengbin, Shen, Chao, Li, Qian
Taking full advantage of the excellent performance of StyleGAN, style transfer-based face swapping methods have been extensively investigated recently. However, these studies require separate face segmentation and blending modules for successful face swapping, and the fixed selection of the manipulated latent code in these works is reckless, thus degrading face swapping quality, generalizability, and practicability. This paper proposes a novel and end-to-end integrated framework for high resolution and attribute preservation face swapping via Adaptive Latent Representation Learning. Specifically, we first design a multi-task dual-space face encoder by sharing the underlying feature extraction network to simultaneously complete the facial region perception and face encoding. This encoder enables us to control the face pose and attribute individually, thus enhancing the face swapping quality. Next, we propose an adaptive latent codes swapping module to adaptively learn the mapping between the facial attributes and the latent codes and select effective latent codes for improved retention of facial attributes. Finally, the initial face swapping image generated by StyleGAN2 is blended with the facial region mask generated by our encoder to address the background blur problem. Our framework integrating facial perceiving and blending into the end-to-end training and testing process can achieve high realistic face-swapping on wild faces without segmentation masks. Experimental results demonstrate the superior performance of our approach over state-of-the-art methods.