AITopics | pretrained language model

In the field of natural language processing, the prevalent approach involves fine-tuning pretrained language models (PLMs) using local samples. Recent research has exposed the susceptibility of PLMs to backdoor attacks, wherein the adversaries can embed malicious prediction behaviors by manipulating a few training samples. In this study, our objective is to develop a backdoor-resistant tuning procedure that yields a backdoor-free model, no matter whether the fine-tuning dataset contains poisoned samples. To this end, we propose and integrate an \emph{honeypot module} into the original PLM, specifically designed to absorb backdoor information exclusively. Our design is motivated by the observation that lower-layer representations in PLMs carry sufficient backdoor features while carrying minimal information about the original tasks. Consequently, we can impose penalties on the information acquired by the honeypot module to inhibit backdoor creation during the fine-tuning process of the stem network. Comprehensive experiments conducted on benchmark datasets substantiate the effectiveness and robustness of our defensive strategy. Notably, these results indicate a substantial reduction in the attack success rate ranging from 10\% to 40\% when compared to prior state-of-the-art methods.

capturing and defeating backdoor, name change, pretrained language model, (6 more...)

Neural Information Processing Systems

Genre: Research Report (0.60)

Industry: Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

CoLLAT: On Adding Fine-grained Audio Understanding to Language Models using Token-Level Locked-Language Tuning

Neural Information Processing SystemsDec-26-2025, 18:10:22 GMT

Humans can easily understand various audio concepts, but conventional audio classification models fail due to their inability to predict unseen classes during training. To address this challenge, recent literature has explored contrastive language-audio pretraining to learn an audio understanding model using natural language supervision from a pretrained language model. However, despite their reasonable zero-shot performance in audio understanding, these models typically fail to achieve optimal performance while preserving the text understanding capabilities of the pretrained language model. They also perform poorly when comprehending audio clips with multiple audio concepts.

collat, pretrained language model, token-level locked-language tuning, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

Filters

Collaborating Authors

pretrained language model

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

01b7575c38dac42f3cfb7d500438b875-Paper.pdf

InfoPrompt: Information-Theoretic Soft Prompt Tuning for Natural Language Understanding Junda Wu1 Tong Y u 2 Rui Wang 3 Zhao Song

Can Unconditional Language Models Recover Arbitrary Sentences?

c9f06bc7b46d0247a91c8fc665c13d0e-Paper.pdf

6fd6b030c6afec018415662d0db43f9d-Supplemental.pdf

7a677bb4477ae2dd371add568dd19e23-Supplemental.pdf

COMPACTER: EfficientLow-RankHypercomplexAdapterLayers

Towards Neural Programming Interfaces

Setting the Trap: Capturing and Defeating Backdoors in Pretrained Language Models through Honeypots

CoLLAT: On Adding Fine-grained Audio Understanding to Language Models using Token-Level Locked-Language Tuning