AITopics | undetectability

Collaborating Authors

undetectability

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

EditDistanceRobustWatermarksforLanguage Models

Neural Information Processing SystemsFeb-9-2026, 14:25:41 GMT

Earlier schemes could only handle stochastic substitutions anddeletions, andthus weareaiming foramore natural and appealing robustness guarantee that holds with respect to editdistance.

artificial intelligence, natural language, prc, (19 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Germany > Berlin (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.45)

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

24c53bfa5b53fc2cf05644f5a7a26bb0-Paper-Conference.pdf

Neural Information Processing SystemsOct-9-2025, 21:06:54 GMT

negl, prc, substitution, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Virginia (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
(4 more...)

Genre: Research Report > Experimental Study (0.92)

Industry: Information Technology > Security & Privacy (0.49)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)

Add feedback

LLM Watermarking Using Mixtures and Statistical-to-Computational Gaps

Abdalla, Pedro, Vershynin, Roman

arXiv.org Artificial IntelligenceJun-26-2025

Large Language Models (LLMs) have emerged as a powerful technology for generating human-like text [3, 20]. On one side, an LLM performs well if it produces text that closely resembles human writing. On the other side, the use of high-performance LLMs also bring undesirable consequences such as the spread of misinformation [19], misuse in education [15, 20], and data pollution [17, 18]. In this context, there is an urge to develop methods to distinguish human and AI generated text to mitigate those outcomes. One prominent technique is the so-called watermarking approach in which the goal is to embed a detectable signal in the text generated by the LLM.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2505.01484

Genre:

Workflow (0.68)
Research Report (0.50)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

An undetectable watermark for generative image models

Gunn, Sam, Zhao, Xuandong, Song, Dawn

arXiv.org Artificial IntelligenceNov-15-2024

We present the first undetectable watermarking scheme for generative image models. Undetectability ensures that no efficient adversary can distinguish between watermarked and un-watermarked images, even after making many adaptive queries. In particular, an undetectable watermark does not degrade image quality under any efficiently computable metric. Our scheme works by selecting the initial latents of a diffusion model using a pseudorandom error-correcting code (Christ and Gunn, 2024), a strategy which guarantees undetectability and robustness. We experimentally demonstrate that our watermarks are quality-preserving and robust using Stable Diffusion 2.1. Our experiments verify that, in contrast to every prior scheme we tested, our watermark does not degrade image quality. Our experiments also demonstrate robustness: existing watermark removal attacks fail to remove our watermark from images without significantly degrading the quality of the images. Finally, we find that we can robustly encode 512 bits in our watermark, and up to 2500 bits when the images are not subjected to watermark removal attacks. Our code is available at https://github.com/XuandongZhao/PRC-Watermark.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2410.07369

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > California (0.04)
North America > United States > Utah > Salt Lake County > Salt Lake City (0.04)
(3 more...)

Genre: Research Report (0.82)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(3 more...)

Add feedback

Planting Undetectable Backdoors in Machine Learning Models

Goldwasser, Shafi, Kim, Michael P., Vaikuntanathan, Vinod, Zamir, Or

arXiv.org Artificial IntelligenceNov-9-2024

Given the computational cost and technical expertise required to train machine learning models, users may delegate the task of learning to a service provider. We show how a malicious learner can plant an undetectable backdoor into a classifier. On the surface, such a backdoored classifier behaves normally, but in reality, the learner maintains a mechanism for changing the classification of any input, with only a slight perturbation. Importantly, without the appropriate "backdoor key", the mechanism is hidden and cannot be detected by any computationally-bounded observer. We demonstrate two frameworks for planting undetectable backdoors, with incomparable guarantees. First, we show how to plant a backdoor in any model, using digital signature schemes. The construction guarantees that given black-box access to the original model and the backdoored version, it is computationally infeasible to find even a single input where they differ. This property implies that the backdoored model has generalization error comparable with the original model. Second, we demonstrate how to insert undetectable backdoors in models trained using the Random Fourier Features (RFF) learning paradigm or in Random ReLU networks. In this construction, undetectability holds against powerful white-box distinguishers: given a complete description of the network and the training data, no efficient distinguisher can guess whether the model is "clean" or contains a backdoor. Our construction of undetectable backdoors also sheds light on the related issue of robustness to adversarial examples. In particular, our construction can produce a classifier that is indistinguishable from an "adversarially robust" classifier, but where every input has an adversarial example! In summary, the existence of undetectable backdoors represent a significant theoretical roadblock to certifying adversarial robustness.

artificial intelligence, backdoor, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2204.06974

Country:

North America > United States > Maryland > Baltimore (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > Canada > Quebec > Montreal (0.04)
(11 more...)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.45)

Add feedback

Watermarking Language Models for Many Adaptive Users

Cohen, Aloni, Hoover, Alexander, Schoenbach, Gabe

arXiv.org Artificial IntelligenceJun-28-2024

We study watermarking schemes for language models with provable guarantees. As we show, prior works offer no robustness guarantees against adaptive prompting: when a user queries a language model more than once, as even benign users do. And with just a single exception (Christ and Gunn, 2024), prior works are restricted to zero-bit watermarking: machine-generated text can be detected as such, but no additional information can be extracted from the watermark. Unfortunately, merely detecting AI-generated text may not prevent future abuses. We introduce multi-user watermarks, which allow tracing model-generated text to individual users or to groups of colluding users, even in the face of adaptive prompting. We construct multi-user watermarking schemes from undetectable, adaptively robust, zero-bit watermarking schemes (and prove that the undetectable zero-bit scheme of Christ, Gunn, and Zamir (2024) is adaptively robust). Importantly, our scheme provides both zero-bit and multi-user assurances at the same time. It detects shorter snippets just as well as the original scheme, and traces longer excerpts to individuals. The main technical component is a construction of message-embedding watermarks from zero-bit watermarks. Ours is the first generic reduction between watermarking schemes for language models. A challenge for such reductions is the lack of a unified abstraction for robustness -- that marked text is detectable even after edits. We introduce a new unifying abstraction called AEB-robustness. AEB-robustness provides that the watermark is detectable whenever the edited text "approximates enough blocks" of model-generated output.

adversary, construction, robustness, (17 more...)

arXiv.org Artificial Intelligence

2405.11109

Country:

North America > United States > Virginia (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.82)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Regional Government > North America Government > United States Government (0.67)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.66)

Add feedback

Injecting Bias in Text-To-Image Models via Composite-Trigger Backdoors

Naseh, Ali, Roh, Jaechul, Bagdasaryan, Eugene, Houmansadr, Amir

arXiv.org Artificial IntelligenceJun-21-2024

Recent advances in large text-conditional image generative models such as Stable Diffusion, Midjourney, and DALL-E 3 have revolutionized the field of image generation, allowing users to produce high-quality, realistic images from textual prompts. While these developments have enhanced artistic creation and visual communication, they also present an underexplored attack opportunity: the possibility of inducing biases by an adversary into the generated images for malicious intentions, e.g., to influence society and spread propaganda. In this paper, we demonstrate the possibility of such a bias injection threat by an adversary who backdoors such models with a small number of malicious data samples; the implemented backdoor is activated when special triggers exist in the input prompt of the backdoored models. On the other hand, the model's utility is preserved in the absence of the triggers, making the attack highly undetectable. We present a novel framework that enables efficient generation of poisoning samples with composite (multi-word) triggers for such an attack. Our extensive experiments using over 1 million generated images and against hundreds of fine-tuned models demonstrate the feasibility of the presented backdoor attack. We illustrate how these biases can bypass conventional detection mechanisms, highlighting the challenges in proving the existence of biases within operational constraints. Our cost analysis confirms the low financial barrier to executing such attacks, underscoring the need for robust defensive strategies against such vulnerabilities in text-to-image generation models.

adversary, dataset, original model, (15 more...)

arXiv.org Artificial Intelligence

2406.15213

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
Asia > Nepal (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > New Finding (0.93)

Industry:

Media (1.00)
Information Technology > Security & Privacy (1.00)
Government (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.88)

Add feedback

Edit Distance Robust Watermarks for Language Models

Golowich, Noah, Moitra, Ankur

arXiv.org Artificial IntelligenceJun-4-2024

Motivated by the problem of detecting AI-generated text, we consider the problem of watermarking the output of language models with provable guarantees. We aim for watermarks which satisfy: (a) undetectability, a cryptographic notion introduced by Christ, Gunn & Zamir (2024) which stipulates that it is computationally hard to distinguish watermarked language model outputs from the model's actual output distribution; and (b) robustness to channels which introduce a constant fraction of adversarial insertions, substitutions, and deletions to the watermarked text. Earlier schemes could only handle stochastic substitutions and deletions, and thus we are aiming for a more natural and appealing robustness guarantee that holds with respect to edit distance. Our main result is a watermarking scheme which achieves both undetectability and robustness to edits when the alphabet size for the language model is allowed to grow as a polynomial in the security parameter. To derive such a scheme, we follow an approach introduced by Christ & Gunn (2024), which proceeds via first constructing pseudorandom codes satisfying undetectability and robustness properties analogous to those above; our key idea is to handle adversarial insertions and deletions by interpreting the symbols as indices into the codeword, which we call indexing pseudorandom codes. Additionally, our codes rely on weaker computational assumptions than used in previous work. Then we show that there is a generic transformation from such codes over large alphabets to watermarking schemes for arbitrary language models.

negl, prc, substitution, (15 more...)

arXiv.org Artificial Intelligence

2406.02633

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Virginia (0.04)
North America > United States > New York > New York County > New York City (0.04)
(5 more...)

Genre: Research Report (0.63)

Industry: Information Technology > Security & Privacy (0.91)

Technology: Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.34)

Add feedback

FIMBA: Evaluating the Robustness of AI in Genomics via Feature Importance Adversarial Attacks

Skovorodnikov, Heorhii, Alkhzaimi, Hoda

arXiv.org Artificial IntelligenceJan-19-2024

For example, in recent years, applications and the widespread adoption genomics sequencing algorithms have become more accessible of genomics sequencing, an increasing and affordable, making them widely used in clinical amount of AI-based algorithms and tools is entering settings (Yang et al., 2020). The importance of genomics sequencing the research and production stage affecting algorithms in disease diagnosis lies in their ability critical decision-making streams like drug discovery to identify genetic mutations that may cause or contribute and clinical outcomes. This paper demonstrates to the development of certain diseases. This information the vulnerability of AI models often utilized can be used to develop personalized treatment plans and downstream tasks on recognized public genomics potentially prevent the onset of certain diseases in at-risk datasets. We undermine model robustness individuals. Overall, genomics sequencing algorithms have by deploying an attack that focuses on input become an essential tool in modern medicine, and their usage transformation while mimicking the real data and is expected to continue to increase in the coming years confusing the model decision-making, ultimately as the field of genomic medicine continues to advance (Jacob yielding a pronounced deterioration in model performance.

adversarial attack, dataset, vector, (16 more...)

arXiv.org Artificial Intelligence

2401.10657

Country:

North America > Montserrat (0.04)
North America > United States > New York (0.04)
North America > United States > California > Santa Cruz County > Santa Cruz (0.04)
Asia > China (0.04)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Biomedical Informatics (1.00)
(2 more...)

Add feedback

Excuse me, sir? Your language model is leaking (information)

Zamir, Or

arXiv.org Artificial IntelligenceJan-18-2024

We introduce a cryptographic method to hide an arbitrary secret payload in the response of a Large Language Model (LLM). A secret key is required to extract the payload from the model's response, and without the key it is provably impossible to distinguish between the responses of the original LLM and the LLM that hides a payload. In particular, the quality of generated text is not affected by the payload. Our approach extends a recent result of Christ, Gunn and Zamir (2023) who introduced an undetectable watermarking scheme for LLMs.

empirical entropy, entropy, language model, (14 more...)

arXiv.org Artificial Intelligence

2401.1036

Country:

North America > United States > Massachusetts (0.04)
Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)

Genre:

Research Report (0.50)
Instructional Material (0.46)

Industry: Information Technology > Security & Privacy (0.70)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback