Apple of Sodom: Hidden Backdoors in Superior Sentence Embeddings via Contrastive Learning

Chen, Xiaoyi, Xin, Baisong, Zhai, Shengfang, Ma, Shiqing, Shen, Qingni, Wu, Zhonghai

Oct-20-2022–arXiv.org Artificial Intelligence

This paper finds that contrastive learning can produce superior sentence embeddings for pre-trained models but is also vulnerable to backdoor attacks. We present the first backdoor attack framework, BadCSE, for state-of-the-art sentence embeddings under supervised and unsupervised learning settings. The attack manipulates the construction of positive and negative pairs so that the backdoored samples have a similar embedding with the target sample (targeted attack) or the negative embedding of its clean version (non-targeted attack). By injecting the backdoor in sentence embeddings, BadCSE is resistant against downstream fine-tuning. We evaluate BadCSE on both STS tasks and other downstream tasks. The supervised non-targeted attack obtains a performance degradation of 194.86%, and the targeted attack maps the backdoored samples to the target embedding with a 97.70% success rate while maintaining the model utility.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

Oct-20-2022

arXiv.org PDF

Add feedback

Country:
- North America > United States > New York > New York County > New York City (0.04)

Genre:
- Research Report (1.00)

Industry:
- Information Technology > Security & Privacy (1.00)

Technology:
- Information Technology
  - Security & Privacy (1.00)
  - Artificial Intelligence
    - Natural Language > Text Processing (0.46)
    - Machine Learning
      - Statistical Learning (0.46)
      - Unsupervised or Indirectly Supervised Learning (0.34)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found