AITopics | defense strategy

Collaborating Authors

defense strategy

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Breaking the False Sense of Security in Backdoor Defense through Re-Activation Attack

Neural Information Processing SystemsMar-22-2026, 14:28:38 GMT

Deep neural networks face persistent challenges in defending against backdoor attacks, leading to an ongoing battle between attacks and defenses. While existing backdoor defense strategies have shown promising performance on reducing attack success rates, can we confidently claim that the backdoor threat has truly been eliminated from the model? To address it, we re-investigate the characteristics of the backdoored models after defense (denoted as defense models). Surprisingly, we find that the original backdoors still exist in defense models derived from existing post-training defense strategies, and the backdoor existence is measured by a novel metric called backdoor existence coefficient. It implies that the backdoors just lie dormant rather than being eliminated. To further verify this finding, we empirically show that these dormant backdoors can be easily re-activated during inference stage, by manipulating the original trigger with well-designed tiny perturbation using universal adversarial attack. More practically, we extend our backdoor re-activation to black-box scenario, where the defense model can only be queried by the adversary during inference stage, and develop two effective methods, i.e., query-based and transfer-based backdoor re-activation attacks. The effectiveness of the proposed methods are verified on both image classification and multimodal contrastive learning (i.e., CLIP) tasks. In conclusion, this work uncovers a critical vulnerability that has never been explored in existing defense strategies, emphasizing the urgency of designing more robust and advanced backdoor defense mechanisms in the future.

artificial intelligence, machine learning, proceedings, (5 more...)

Neural Information Processing Systems

Industry: Information Technology > Security & Privacy (0.59)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.59)

Add feedback

MoGU: A Framework for Enhancing Safety of LLMs While Preserving Their Usability

Neural Information Processing SystemsMar-21-2026, 20:13:14 GMT

Large Language Models (LLMs) are increasingly deployed in various applications. As their usage grows, concerns regarding their safety are rising, especially in maintaining harmless responses when faced with malicious instructions. Many defense strategies have been developed to enhance the safety of LLMs. However, our research finds that existing defense strategies lead LLMs to predominantly adopt a rejection-oriented stance, thereby diminishing the usability of their responses to benign instructions. To solve this problem, we introduce the MoGU framework, designed to enhance LLMs' safety while preserving their usability. Our MoGU framework transforms the base LLM into two variants: the usable LLM and the safe LLM, and further employs dynamic routing to balance their contribution. When encountering malicious instructions, the router will assign a higher weight to the safe LLM to ensure that responses are harmless.

artificial intelligence, large language model, natural language, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Reimagining Mutual Information for Enhanced Defense against Data Leakage in Collaborative Inference

Neural Information Processing SystemsMar-20-2026, 11:51:51 GMT

Edge-cloud collaborative inference empowers resource-limited IoT devices to support deep learning applications without disclosing their raw data to the cloud server, thus protecting user's data. Nevertheless, prior research has shown that collaborative inference still results in the exposure of input and predictions from edge devices. To defend against such data leakage in collaborative inference, we introduce InfoScissors, a defense strategy designed to reduce the mutual information between a model's intermediate outcomes and the device's input and predictions. We evaluate our defense on several datasets in the context of diverse attacks. Besides the empirical comparison, we provide a theoretical analysis of the inadequacies of recent defense strategies that also utilize mutual information, particularly focusing on those based on the Variational Information Bottleneck (VIB) approach. We illustrate the superiority of our method and offer a theoretical analysis of it.

artificial intelligence, cloud computing, machine learning, (9 more...)

Neural Information Processing Systems

Technology:

Information Technology > Cloud Computing (0.61)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.61)

Add feedback

Robust Detection of Adversarial Attacks by Modeling the Intrinsic Properties of Deep Neural Networks

Zhihao Zheng, Pengyu Hong

Neural Information Processing SystemsFeb-15-2026, 02:11:36 GMT

Neural Information Processing Systems http://nips.cc/

classifier, dnn classifier, i-defender, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Waltham (0.04)
North America > Canada > Quebec > Montreal (0.04)

Industry:

Transportation (0.70)
Information Technology > Security & Privacy (0.65)
Government > Military (0.41)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.84)

Add feedback

6740526b78c0b230e41ae61d8ca07cf5-Paper.pdf

Neural Information Processing SystemsFeb-8-2026, 17:35:28 GMT

algorithm, perturbation, robustness, (14 more...)

Neural Information Processing Systems

Country:

North America > Canada (0.04)
Asia > China > Shanghai > Shanghai (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report > New Finding (0.68)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
(2 more...)

Add feedback

43da8cca8f14139774bcbd935d51e0f2-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-8-2026, 15:06:21 GMT

deq, gradient, robustness, (17 more...)

Neural Information Processing Systems

Country:

Asia > China > Beijing > Beijing (0.04)
Europe > Austria (0.04)
North America > Canada > British Columbia > Vancouver (0.04)
(2 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Trap and Replace: Defending Backdoor Attacks by Trapping Them into an Easy-to-Replace Subnetwork

Neural Information Processing SystemsDec-25-2025, 14:57:03 GMT

Deep neural networks (DNNs) are vulnerable to backdoor attacks. Previous works have shown it extremely challenging to unlearn the undesired backdoor behavior from the network, since the entire network can be affected by the backdoor samples. In this paper, we propose a brand-new backdoor defense strategy, which makes it much easier to remove the harmful influence of backdoor samples from the model. Our defense strategy, \emph{Trap and Replace}, consists of two stages.

backdoor attack, classification head, trap and replace, (8 more...)

Neural Information Processing Systems

Industry: Information Technology > Security & Privacy (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.58)

Add feedback

Robust Detection of Adversarial Attacks by Modeling the Intrinsic Properties of Deep Neural Networks

Zhihao Zheng, Pengyu Hong

Neural Information Processing SystemsNov-20-2025, 20:56:48 GMT

In addition, the robustness of the model trained by this strategy depends on the attacks covered by the adversarial training examples.

artificial intelligence, classifier, machine learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Waltham (0.04)
North America > Canada > Quebec > Montreal (0.04)

Industry:

Transportation (0.70)
Information Technology > Security & Privacy (0.65)
Government > Military (0.41)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.84)

Add feedback

AlignTree: Efficient Defense Against LLM Jailbreak Attacks

Goren, Gil, Katz, Shahar, Wolf, Lior

arXiv.org Artificial IntelligenceNov-18-2025

Large Language Models (LLMs) are vulnerable to adversarial attacks that bypass safety guidelines and generate harmful content. Mitigating these vulnerabilities requires defense mechanisms that are both robust and computationally efficient. However, existing approaches either incur high computational costs or rely on lightweight defenses that can be easily circumvented, rendering them impractical for real-world LLM-based systems. In this work, we introduce the AlignTree defense, which enhances model alignment while maintaining minimal computational overhead. AlignTree monitors LLM activations during generation and detects misaligned behavior using an efficient random forest classifier. This classifier operates on two signals: (i) the refusal direction -- a linear representation that activates on misaligned prompts, and (ii) an SVM-based signal that captures non-linear features associated with harmful content. Unlike previous methods, AlignTree does not require additional prompts or auxiliary guard models. Through extensive experiments, we demonstrate the efficiency and robustness of AlignTree across multiple LLMs and benchmarks.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2511.12217

Country: