Secure Transfer Learning: Training Clean Models Against Backdoor in (Both) Pre-trained Encoders and Downstream Datasets

Zhang, Yechao, Zhou, Yuxuan, Li, Tianyu, Li, Minghui, Hu, Shengshan, Luo, Wei, Zhang, Leo Yu

Apr-17-2025–arXiv.org Artificial Intelligence

Transfer learning from pre-trained encoders has become essential in modern machine learning, enabling efficient model adaptation across diverse tasks. However, this combination of pre-training and downstream adaptation creates an expanded attack surface, exposing models to sophisticated backdoor embeddings at both the encoder and dataset levels--an area often overlooked in prior research. Additionally, the limited computational resources typically available to users of pre-trained encoders constrain the effectiveness of generic backdoor defenses compared to end-to-end training from scratch. In this work, we investigate how to mitigate potential backdoor risks in resource-constrained transfer learning scenarios. Specifically, we conduct an exhaustive analysis of existing defense strategies, revealing that many follow a reactive workflow based on assumptions that do not scale to unknown threats, novel attack types, or different training paradigms. In response, we introduce a proactive mindset focused on identifying clean elements and propose the Trusted Core (T-Core) Bootstrapping framework, which emphasizes the importance of pinpointing trustworthy data and neurons to enhance model security. Our empirical evaluations demonstrate the effectiveness and superiority of T-Core, specifically assessing 5 encoder poisoning attacks, 7 dataset poisoning attacks, and 14 baseline defenses across five benchmark datasets, addressing four scenarios of 3 potential backdoor threats.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

Apr-17-2025

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.82)

Industry:
- Information Technology > Security & Privacy (1.00)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Performance Analysis > Accuracy (1.00)
  - Neural Networks > Deep Learning (1.00)
  - Statistical Learning (0.92)
  - Transfer Learning (0.81)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found