AITopics | poisoning

Collaborating Authors

poisoning

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

ASet of Generalized Components to Achieve Effective Poison-only Clean-label Backdoor Attacks with Collaborative Sample Selection and Triggers

Neural Information Processing SystemsJun-23-2026, 03:53:09 GMT

Poison-only Clean-label Backdoor Attacks (PCBAs) aim to covertly inject attackerdesired behavior into DNNs by merely poisoning the dataset without changing the labels. To effectively implant a backdoor, multiple triggers are proposed for various attack requirements of Attack Success Rate (ASR) and stealthiness. Additionally, sample selection enhances clean-label backdoor attacks' ASR by meticulously selecting "hard" samples instead of random samples to poison. Current methods, however, 1) usually handle the sample selection and triggers in isolation, leading to limited performance on both ASR and stealthiness when converted to PCBAs. Therefore, we seek to explore the bi-directional collaborative relations between the sample selection and triggers to address the above dilemma.

artificial intelligence, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country:

North America > United States (0.28)
Asia > China > Guangdong Province (0.14)

Genre: Research Report > Experimental Study (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)
(3 more...)

Add feedback

Provable Watermarking for Data Poisoning Attacks

Neural Information Processing SystemsJun-23-2026, 03:42:52 GMT

In recent years, data poisoning attacks have been increasingly designed to appear harmless and even beneficial, often with the intention of verifying dataset ownership or safeguarding private data from unauthorized use. However, these developments have the potential to cause misunderstandings and conflicts, as data poisoning has traditionally been regarded as a security threat to machine learning systems. To address this issue, it is imperative for harmless poisoning generators to claim ownership of their generated datasets, enabling users to identify potential poisoning to prevent misuse. In this paper, we propose the deployment of watermarking schemes as a solution to this challenge. We introduce two provable and practical watermarking approaches for data poisoning: post-poisoning watermarking and poisoning-concurrent watermarking. Our analyses demonstrate that when the watermarking length is Θ( d/ϵw)for post-poisoning watermarking, and falls within the range of Θ(1/ϵ2w)to O( d/ϵp)for poisoning-concurrent watermarking, the watermarked poisoning dataset provably ensures both watermarking detectability and poisoning utility, certifying the practicality of watermarking under data poisoning attacks.

artificial intelligence, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

Agnostic Learning under Targeted Poisoning: Optimal Rates and the Role of Randomness

Neural Information Processing SystemsJun-21-2026, 00:31:51 GMT

We study the problem of learning in the presence of an adversary that can corrupt an η fraction of the training examples with the goal of causing failure on a specific test point. In the realizable setting, prior work established that the optimal error under such instance-targeted poisoning attacks scales as Θ(dη), where d is the VC dimension of the hypothesis class [Hanneke, Karbasi, Mahmoody, Mehalel, and Moran (NeurIPS 2022)]. In this work, we resolve the corresponding question in the agnostic setting. We show that the optimal excess error is eΘ( dη), answering one of the main open problems left by Hanneke et al. To achieve this rate, it is necessary to use randomized learners: Hanneke et al. showed that deterministic learners can be forced to suffer error close to 1 even under small amounts of poisoning.

artificial intelligence, learner, machine learning, (20 more...)

Neural Information Processing Systems

Country:

North America > United States (0.46)
Asia > Middle East > Israel (0.14)

Genre: Research Report > Experimental Study (1.00)

Industry: Information Technology > Security & Privacy (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.54)

Add feedback

MIBP-Cert: Certified Training against Data Perturbations with Mixed-Integer Bilinear Programs

Neural Information Processing SystemsJun-19-2026, 15:35:15 GMT

Data errors, corruptions, and poisoning attacks during training pose a major threat to the reliability of modern AI systems. While extensive effort has gone into empirical mitigations, the evolving nature of attacks and the complexity of data require a more principled, provable approach to robustly learn on such data--and to understand how perturbations influence the final model. Hence, we introduce MIBPCert, a novel certification method based on mixed-integer bilinear programming (MIBP) that computes sound, deterministic bounds to provide provable robustness even under complex threat models. By computing the set of parameters reachable through perturbed or manipulated data, we can predict all possible outcomes and guarantee robustness. To make solving this optimization problem tractable, we propose a novel relaxation scheme that bounds each training step without sacrificing soundness. We demonstrate the applicability of our approach to continuous and discrete data, as well as different threat models--including complex ones that were previously out of reach.

artificial intelligence, constraint, machine learning, (20 more...)

Neural Information Processing Systems

Country: Europe > Germany (0.28)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)
Government > Regional Government (0.68)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

7ff65a57e916785a271d97f7236f1323-Paper-Conference.pdf

Neural Information Processing SystemsJun-19-2026, 00:55:53 GMT

Membership inference tests aim to determine whether a particular data point was included in a language model's training set. However, recent works have shown that such tests often fail under the strict definition of membership based on exact matching, and have suggested relaxing this definition to include semantic neighbors as members as well. In this work, we show that membership inference tests are still unreliable under this relaxation -- it is possible to poison the training dataset in a way that causes the test to produce incorrect predictions for a target point. We theoretically reveal a trade-off between a test's accuracy and its robustness to poisoning. We also present a concrete instantiation of this poisoning attack and empirically validate its effectiveness. Our results show that it can degrade the performance of existing tests to well below random.

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country: North America > United States (0.93)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.86)

Industry:

Law (1.00)
Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

SeCon-RAG: ATwo-Stage Semantic Filtering and Conflict-Free Framework for Trustworthy RAG

Neural Information Processing SystemsJun-17-2026, 23:22:10 GMT

Retrieval-augmented generation (RAG) systems enhance large language models (LLMs) with external knowledge but are vulnerable to corpus poisoning and contamination attacks, which can compromise output integrity. Existing defenses often apply aggressive filtering, leading to unnecessary loss of valuable information and reduced reliability in generation. To address this problem, we propose a two-stage semantic filtering and conflict-free framework for trustworthy RAG. In the first stage, we perform a joint filter with semantic and cluster-based filtering which is guided by the Entity-intent-relation extractor (EIRE). EIRE extracts entities, latent objectives, and entity relations from both the user query and filtered documents, scores their semantic relevance, and selectively adds valuable documents into the clean retrieval database. In the second stage, we proposed an EIRE-guided conflict-aware filtering module, which analyzes semantic consistency between the query, candidate answers, and retrieved knowledge before final answer generation, filtering out internal and external contradictions that could mislead the model. Through this two-stage process, SeCon-RAG effectively preserves useful knowledge while mitigating conflict contamination, achieving significant improvements in both generation robustness and output trustworthiness. Extensive experiments across various LLMs and datasets demonstrate that the proposed SeCon-RAG markedly outperforms state-of-the-art defense methods.

arxiv preprint arxiv, large language model, machine learning, (20 more...)

Neural Information Processing Systems

Country:

Asia > China (0.28)
North America > United States (0.28)
Asia > Singapore (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Media (0.92)
Transportation > Air (0.68)
Government > Military (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

The Implicit Bias of Structured State Space Models Can Be Poisoned With Clean Labels

Neural Information Processing SystemsJun-17-2026, 14:08:59 GMT

Neural networks are powered by an implicit bias: a tendency of gradient descent to fit training data in a way that generalizes to unseen data. A recent class of neural network models gaining increasing popularity is structured state space models (SSMs). Prior work argued that the implicit bias of SSMs leads to generalization in a setting where data is generated by a low dimensional teacher. In this paper, we revisit the latter setting, and formally establish a phenomenon entirely undetected by prior work on the implicit bias of SSMs. Namely, we prove that while implicit bias leads to generalization under many choices of training data, there exist special examples whose inclusion in training completely distorts the implicit bias, to a point where generalization fails. This failure occurs despite the special training examples being labeled by the teacher, i.e., having clean labels! We empirically demonstrate the phenomenon, with SSMs trained independently and as part of non-linear neural networks. In the area of adversarial machine learning, disrupting generalization with cleanly labeled training examples is known as clean-label poisoning. Given the proliferation of SSMs, we believe that delineating their susceptibility to clean-label poisoning, and developing methods for overcoming this susceptibility, are critical research directions to pursue.

artificial intelligence, experiment, machine learning, (19 more...)

Neural Information Processing Systems

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)

Add feedback

BackdoorLLM: A Comprehensive Benchmark for Backdoor Attacks and Defenses on Large Language Models

Neural Information Processing SystemsJun-11-2026, 03:08:01 GMT

Generative large language models (LLMs) have achieved state-of-the-art results on a wide range of tasks, yet they remain susceptible to backdoor attacks: carefully crafted triggers in the input can manipulate the model to produce adversary-specified outputs. While prior research has predominantly focused on backdoor risks in vision and classification settings, the vulnerability of LLMs in open-ended text generation remains underexplored.

artificial intelligence, large language model, natural language, (6 more...)

Neural Information Processing Systems

Industry: Information Technology > Security & Privacy (0.42)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)

Add feedback

Sageflow: Robust Federated Learning against Both Stragglers and Adversaries (Supplementary Material)

Neural Information Processing SystemsApr-24-2026, 12:48:35 GMT

A.1 Scenario with only stragglers The hyperparameter settings for Sageflow are shown in Table 1. For the schemes ignore stragglers and wait for stragglers combined with FedAvg, we decayed the learning rate during training. For the FedAsync scheme of [7], we take a polynomial strategy with hyperparameters a= 0.5, α= 0.8, and decayed γ during training. A.2 Scenario with only adversaries Data poisoning and model poisoning attacks: Table 2 describes the hyperparameters for Sageflow with only adversaries, under data poisoning and model poisoning attacks. For RFA of [5], the maximum iteration is set to 10. In this setup, the learning rate is decayed for all three schemes (Sageflow, RFA, FedAvg).

adversary, artificial intelligence, machine learning, (15 more...)

Neural Information Processing Systems

Genre: Research Report (0.46)

Industry:

Health & Medicine (0.48)
Information Technology > Security & Privacy (0.31)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Filters

Collaborating Authors

poisoning

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

ASet of Generalized Components to Achieve Effective Poison-only Clean-label Backdoor Attacks with Collaborative Sample Selection and Triggers

Provable Watermarking for Data Poisoning Attacks

Agnostic Learning under Targeted Poisoning: Optimal Rates and the Role of Randomness

MIBP-Cert: Certified Training against Data Perturbations with Mixed-Integer Bilinear Programs

7ff65a57e916785a271d97f7236f1323-Paper-Conference.pdf

SeCon-RAG: ATwo-Stage Semantic Filtering and Conflict-Free Framework for Trustworthy RAG

The Implicit Bias of Structured State Space Models Can Be Poisoned With Clean Labels

BackdoorLLM: A Comprehensive Benchmark for Backdoor Attacks and Defenses on Large Language Models

828bb8f42d4ab15322b9315151959c61-Paper-Conference.pdf

Sageflow: Robust Federated Learning against Both Stragglers and Adversaries (Supplementary Material)