Strategic Sample Selection for Improved Clean-Label Backdoor Attacks in Text Classification

Aug-25-2025–arXiv.org Artificial Intelligence

Backdoor attacks pose a significant threat to the integrity of text classification models used in natural language proce ssing. While several dirty-label attacks that achieve high attack succe ss rates (ASR) have been proposed, clean-label attacks are inherently mor e difficult. In this paper, we propose three sample selection strategies to improve attack effectiveness in clean-label scenarios: Minimum, Above50, and Below50. Our strategies identify those samples which the model predi cts incorrectly or with low confidence, and by injecting backdoor trig gers into such samples, we aim to induce a stronger association betwee n the trigger patterns and the attacker-desired target label. We appl y our methods to clean-label variants of four canonical backdoor atta cks (Insert-Sent, WordInj, StyleBkd, SynBkd) and evaluate them on three datasets (IMDB, SST2, HateSpeech) and four model types (LSTM, BERT, D istilBERT, RoBERTa). Results show that the proposed strategi es, particularly the Minimum strategy, significantly improve the ASR o ver random sample selection with little or no degradation in the mod el's clean accuracy. Furthermore, clean-label attacks enhanced by ou r strategies outperform BITE, a state of the art clean-label attack metho d, in many configurations.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

Aug-25-2025

arXiv.org PDF

Add feedback

Genre:
- Research Report > New Finding (0.66)

Industry:
- Information Technology > Security & Privacy (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)