AITopics | backdoor feature

Collaborating Authors

backdoor feature

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

BAN: Detecting Backdoors Activated by Adversarial Neuron Noise

Neural Information Processing SystemsMar-22-2026, 13:58:00 GMT

Backdoor attacks on deep learning represent a recent threat that has gained significant attention in the research community. Backdoor defenses are mainly based on backdoor inversion, which has been shown to be generic, model-agnostic, and applicable to practical threat scenarios. State-of-the-art backdoor inversion recovers a mask in the feature space to locate prominent backdoor features, where benign and backdoor features can be disentangled. However, it suffers from high computational overhead, and we also find that it overly relies on prominent backdoor features that are highly distinguishable from benign features. To tackle these shortcomings, this paper improves backdoor feature inversion for backdoor detection by incorporating extra neuron activation information. In particular, we adversarially increase the loss of backdoored models with respect to weights to activate the backdoor effect, based on which we can easily differentiate backdoored and clean models. Experimental results demonstrate our defense, BAN, is 1.37$\times$ (on CIFAR-10) and 5.11$\times$ (on ImageNet200) more efficient with an average 9.99\% higher detect success rate than the state-of-the-art defense BTI DBF. Our code and trained models are publicly available at https://github.com/xiaoyunxxy/ban.

artificial intelligence, machine learning, proceedings, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.77)

Add feedback

cfaccbd9b5e62562779351ebcb140c94-Paper-Conference.pdf

Neural Information Processing SystemsFeb-18-2026, 05:40:39 GMT

artificial intelligence, deep learning, machine learning, (18 more...)

Neural Information Processing Systems

Country:

Europe > Netherlands > South Holland > Delft (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Industry: Information Technology > Security & Privacy (0.95)

Technology:

Information Technology > Security & Privacy (0.95)
Information Technology > Artificial Intelligence > Vision (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Towards Stable Backdoor Purification through Feature Shift Tuning

Neural Information Processing SystemsFeb-17-2026, 20:20:25 GMT

Deep Neural Networks (DNNs) are shown vulnerable to various security threats.

artificial intelligence, machine learning, poisoning rate, (16 more...)

Neural Information Processing Systems

Country:

North America > United States (0.04)
Asia > Middle East > Israel (0.04)

Genre: Research Report > New Finding (0.67)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Vision (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

BAN: Detecting Backdoors Activated by Adversarial Neuron Noise

Neural Information Processing SystemsOct-10-2025, 17:08:36 GMT

Backdoor attacks on deep learning represent a recent threat that has gained significant attention in the research community.

backdoor attack, backdoor feature, noise, (15 more...)

Neural Information Processing Systems

Country:

Europe > Netherlands > South Holland > Delft (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Vision (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

2b09bb02b90584e2be94ff3ae09289bc-Paper-Conference.pdf

Neural Information Processing SystemsOct-9-2025, 21:48:58 GMT

dataset, effectiveness, ferrari, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > California (0.14)
Asia > Malaysia (0.14)
North America > United States > New York > New York County > New York City (0.04)
(7 more...)

Genre: Research Report > Experimental Study (0.93)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Government (0.92)
Health & Medicine (0.68)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Add feedback

ee37d51b3c003d89acba2363dde256af-Paper-Conference.pdf

Neural Information Processing SystemsOct-9-2025, 11:05:08 GMT

artificial intelligence, machine learning, poisoning rate, (15 more...)

Neural Information Processing Systems

Country:

North America > United States (0.04)
Asia > Middle East > Israel (0.04)

Genre: Research Report > New Finding (0.67)

Industry: Information Technology > Security & Privacy (0.95)

Technology:

Information Technology > Security & Privacy (0.95)
Information Technology > Artificial Intelligence > Vision (0.93)
Information Technology > Sensing and Signal Processing > Image Processing (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Backdoor Attribution: Elucidating and Controlling Backdoor in Language Models

Yu, Miao, Zhou, Zhenhong, Aloqaily, Moayad, Wang, Kun, Huang, Biwei, Wang, Stephen, Jin, Yueming, Wen, Qingsong

arXiv.org Artificial IntelligenceOct-1-2025

Fine-tuned Large Language Models (LLMs) are vulnerable to backdoor attacks through data poisoning, yet the internal mechanisms governing these attacks remain a black box. Previous research on interpretability for LLM safety tends to focus on alignment, jailbreak, and hallucination, but overlooks backdoor mechanisms, making it difficult to understand and fully eliminate the backdoor threat. In this paper, aiming to bridge this gap, we explore the interpretable mechanisms of LLM backdoors through Backdoor Attribution (BkdAttr), a tripartite causal analysis framework. We first introduce the Backdoor Probe that proves the existence of learnable backdoor features encoded within the representations. Building on this insight, we further develop Backdoor Attention Head Attribution (BAHA), efficiently pinpointing the specific attention heads responsible for processing these features. Our primary experiments reveals these heads are relatively sparse; ablating a minimal \textbf{$\sim$ 3%} of total heads is sufficient to reduce the Attack Success Rate (ASR) by \textbf{over 90%}. More importantly, we further employ these findings to construct the Backdoor Vector derived from these attributed heads as a master controller for the backdoor. Through only \textbf{1-point} intervention on \textbf{single} representation, the vector can either boost ASR up to \textbf{$\sim$ 100% ($\uparrow$)} on clean inputs, or completely neutralize backdoor, suppressing ASR down to \textbf{$\sim$ 0% ($\downarrow$)} on triggered inputs. In conclusion, our work pioneers the exploration of mechanistic interpretability in LLM backdoors, demonstrating a powerful method for backdoor control and revealing actionable insights for the community.

backdoor, large language model, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2509.21761

Country: Asia (0.28)

Genre: Research Report (0.82)

Industry:

Leisure & Entertainment > Sports (0.68)
Information Technology > Security & Privacy (0.49)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Add feedback

Clustered Federated Learning via Embedding Distributions

Zhang, Dekai, Williams, Matthew, Toni, Francesca

arXiv.org Artificial IntelligenceJun-10-2025

Federated learning (FL) is a widely used framework for machine learning in distributed data environments where clients hold data that cannot be easily centralised, such as for data protection reasons. FL, however, is known to be vulnerable to non-IID data. Clustered FL addresses this issue by finding more homogeneous clusters of clients. We propose a novel one-shot clustering method, EMD-CFL, using the Earth Mover's distance (EMD) between data distributions in embedding space. We theoretically motivate the use of EMDs using results from the domain adaptation literature and demonstrate empirically superior clustering performance in extensive comparisons against 16 baselines and on a range of challenging datasets.

artificial intelligence, deep learning, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2506.07769

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

BAN: Detecting Backdoors Activated by Adversarial Neuron Noise

Neural Information Processing SystemsMay-27-2025, 17:19:03 GMT

adversarial neuron noise, backdoor feature, detecting backdoor activated, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.81)

Add feedback

Towards Backdoor Stealthiness in Model Parameter Space

Xu, Xiaoyun, Liu, Zhuoran, Koffas, Stefanos, Picek, Stjepan

arXiv.org Artificial IntelligenceJan-10-2025

Recent research on backdoor stealthiness focuses mainly on indistinguishable triggers in input space and inseparable backdoor representations in feature space, aiming to circumvent backdoor defenses that examine these respective spaces. However, existing backdoor attacks are typically designed to resist a specific type of backdoor defense without considering the diverse range of defense mechanisms. Based on this observation, we pose a natural question: Are current backdoor attacks truly a real-world threat when facing diverse practical defenses? To answer this question, we examine 12 common backdoor attacks that focus on input-space or feature-space stealthiness and 17 diverse representative defenses. Surprisingly, we reveal a critical blind spot: Backdoor attacks designed to be stealthy in input and feature spaces can be mitigated by examining backdoored models in parameter space. To investigate the underlying causes behind this common vulnerability, we study the characteristics of backdoor attacks in the parameter space. Notably, we find that input- and feature-space attacks introduce prominent backdoor-related neurons in parameter space, which are not thoroughly considered by current backdoor attacks. Taking comprehensive stealthiness into account, we propose a novel supply-chain attack called Grond. Grond limits the parameter changes by a simple yet effective module, Adversarial Backdoor Injection (ABI), which adaptively increases the parameter-space stealthiness during the backdoor injection. Extensive experiments demonstrate that Grond outperforms all 12 backdoor attacks against state-of-the-art (including adaptive) defenses on CIFAR-10, GTSRB, and a subset of ImageNet. In addition, we show that ABI consistently improves the effectiveness of common backdoor attacks.

artificial intelligence, backdoor attack, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2501.05928

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > Netherlands > South Holland > Delft (0.04)
Europe > Netherlands > Gelderland > Nijmegen (0.04)
Asia > Nepal (0.04)

Genre: Research Report > New Finding (0.88)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback