AITopics

Country:

North America > United States (0.46)
Europe (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.36)

Neural Information Processing SystemsMar-20-2026, 22:56:15 GMT

Bileve: Securing Text Provenance in Large Language Models Against Spoofing with Bi-level Signature

Text watermarks for large language models (LLMs) have been commonly used to identify the origins of machine-generated content, which is promising for assessing liability when combating deepfake or harmful content. While existing watermarking techniques typically prioritize robustness against removal attacks, unfortunately, they are vulnerable to spoofing attacks: malicious actors can subtly alter the meanings of LLM-generated responses or even forge harmful content, potentially misattributing blame to the LLM developer. To overcome this, we introduce a bi-level signature scheme, Bileve, which embeds fine-grained signature bits for integrity checks (mitigating spoofing attacks) as well as a coarse-grained signal to trace text sources when the signature is invalid (enhancing detectability) via a novel rank-based sampling strategy. Compared to conventional watermark detectors that only output binary results, Bileve can differentiate 5 scenarios during detection, reliably tracing text provenance and regulating LLMs. The experiments conducted on OPT-1.3B and LLaMA-7B demonstrate the effectiveness of Bileve in defeating spoofing attacks with enhanced detectability.

artificial intelligence, large language model, natural language, (6 more...)

Industry: Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Arias, Esteban Garces, Sapargali, Nurzhan, Heumann, Christian, Aßenmacher, Matthias

The Truncation Blind Spot: How Decoding Strategies Systematically Exclude Human-Like Token Choices

arXiv.org Machine LearningMar-20-2026

Standard decoding strategies for text generation, including top-k, nucleus sampling, and contrastive search, select tokens based on likelihood, restricting selection to high-probability regions. Human language production operates differently: tokens are chosen for communicative appropriateness rather than statistical frequency. This mismatch creates a truncation blind spot: contextually appropriate but statistically rare tokens remain accessible to humans yet unreachable by likelihood-based decoding. We hypothesize this contributes to the detectability of machine-generated text. Analyzing over 1.8 million texts across eight language models, five decoding strategies, and 53 hyperparameter configurations, we find that 8-18% of human-selected tokens fall outside typical truncation boundaries. Simple classifiers trained on predictability and lexical diversity achieve remarkable detection rates. Crucially, neither model scale nor architecture correlates strongly with detectability; truncation parameters account for most variance. Configurations achieving low detectability often produce incoherent text, indicating that evading detection and producing natural text are distinct objectives. These findings suggest detectability is enhanced by likelihood-based token selection, not merely a matter of model capability.

large language model, machine learning, natural language, (18 more...)

arXiv.org Machine Learning

2603.18482

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
(6 more...)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)

Neural Information Processing SystemsFeb-15-2026, 12:19:41 GMT

Bileve: Securing Text Provenance in Large Language Models Against Spoofing with Bi-level Signature

large language model, machine learning, natural language, (21 more...)

Country: North America > United States > Virginia (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Neural Information Processing SystemsFeb-8-2026, 18:17:04 GMT

691dcb1d65f31967a874d18383b9da75-Paper.pdf

background, detectability, eccentricity, (15 more...)

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Oceania > Australia > Victoria > Melbourne (0.04)
North America > United States > New York > Suffolk County > Huntington (0.04)
(2 more...)

Genre: Research Report (0.48)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
(3 more...)

Neural Information Processing SystemsFeb-8-2026, 18:16:53 GMT

691dcb1d65f31967a874d18383b9da75-AuthorFeedback.pdf

background, detectability, saccade, (15 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Martin, Michael R., Chan, Garrick, Ma, Kwan-Liu

Interpreting Structured Perturbations in Image Protection Methods for Diffusion Models

arXiv.org Artificial IntelligenceDec-10-2025

Recent image protection mechanisms such as Glaze and Nightshade introduce imperceptible, adversarially designed perturbations intended to disrupt downstream text-to-image generative models. While their empirical effectiveness has been demonstrated, the internal structure, detectability, and representational behavior of these perturbations remain poorly understood. In this study, we demonstrated a systematic explainable AI analysis of image protection perturbations using a unified framework that integrates white-box feature-space inspection and black-box signal-level probing. Through latent-space clustering, feature-channel activation analysis, occlusion-based spatial sensitivity mapping, and frequency-domain spectral characterization, we revealed that modern protection mechanisms operate as structured, low-entropy perturbations that remain tightly coupled to underlying image content across representational, spatial, and spectral domains in all evaluated cases. We showed that protected images preserve content-driven feature organization with protection-specific substructure rather than inducing global representational drift. Detectability is governed by interacting effects of perturbation entropy, spatial deployment, and frequency alignment as revealed through combined synthetic and spectral analyses, with sequential protection amplifying detectable structure rather than suppressing it. Frequency-domain analysis further demonstrated that Glaze and Nightshade redistribute energy along dominant image-aligned frequency axes rather than introducing spectrally diffuse noise. These results suggested that contemporary image protection operates through structured feature-level deformation rather than semantic dislocation, providing mechanistic insight into why protection signals remain visually subtle yet consistently detectable. This work advances the interpretability of adversarial image protection and informs the design of future defenses and detection strategies for generative AI systems.

artificial intelligence, machine learning, natural language, (20 more...)

2512.08329

Country: North America > United States (0.46)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Loodaricheh, Mahdi Arab, Manshaei, Mohammad Hossein, Raja, Anita

MAGE-ID: A Multimodal Generative Framework for Intrusion Detection Systems

arXiv.org Artificial IntelligenceDec-4-2025

Abstract--Modern Intrusion Detection Systems (IDS) face severe challenges due to heterogeneous network traffic, evolving cyber threats, and pronounced data imbalance between benign and attack flows. While generative models have shown promise in data augmentation, existing approaches are limited to single modalities and fail to capture cross-domain dependencies. This paper introduces MAGE-ID (Multimodal Attack Generator for Intrusion Detection), a diffusion-based generative framework that couples tabular flow features with their transformed images through a unified latent prior . By jointly training Transformer-and CNN-based variational encoders with an EDM-style denoiser, MAGE-ID achieves balanced and coherent multimodal synthesis. Evaluations on CIC-IDS-2017 and NSL-KDD demonstrate significant improvements in fidelity, diversity, and downstream detection performance over T abSyn and T abDDPM, highlighting MAGE-ID's effectiveness for multimodal IDS augmentation.

artificial intelligence, data mining, machine learning, (16 more...)

2512.03375

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
(2 more...)

arXiv.org Artificial IntelligenceDec-2-2025

WaterSearch: A Quality-Aware Search-based Watermarking Framework for Large Language Models

Lin, Yukang, Shao, Jiahao, Jiang, Shuoran, Zhu, Wentao, Lu, Bingjie, Wu, Xiangping, Siebert, Joanna, Chen, Qingcai

Watermarking acts as a critical safeguard in text generated by Large Language Models (LLMs). By embedding identifiable signals into model outputs, watermarking enables reliable attribution and enhances the security of machine-generated content. Existing approaches typically embed signals by manipulating token generation probabilities. Despite their effectiveness, these methods inherently face a trade-off between detectability and text quality: the signal strength and randomness required for robust watermarking tend to degrade the performance of downstream tasks. In this paper, we design a novel embedding scheme that controls seed pools to facilitate diverse parallel generation of watermarked text. Based on that scheme, we propose WaterSearch, a sentence-level, search-based watermarking framework adaptable to a wide range of existing methods. WaterSearch enhances text quality by jointly optimizing two key aspects: 1) distribution fidelity and 2) watermark signal characteristics. Furthermore, WaterSearch is complemented by a sentence-level detection method with strong attack robustness. We evaluate our method on three popular LLMs across ten diverse tasks. Extensive experiments demonstrate that our method achieves an average performance improvement of 51.01\% over state-of-the-art baselines at a watermark detectability strength of 95\%. In challenging scenarios such as short text generation and low-entropy output generation, our method yields performance gains of 47.78\% and 36.47\%, respectively. Moreover, under different attack senarios including insertion, synonym substitution and paraphrase attasks, WaterSearch maintains high detectability, further validating its robust anti-attack capabilities. Our code is available at \href{https://github.com/Yukang-Lin/WaterSearch}{https://github.com/Yukang-Lin/WaterSearch}.

large language model, machine learning, natural language, (16 more...)

2512.00837

Country:

Asia (0.28)
Europe (0.28)
North America > United States (0.28)

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Energy > Power Industry > Utilities > Nuclear (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

arXiv.org Artificial IntelligenceNov-27-2025

TAB-DRW: A DFT-based Robust Watermark for Generative Tabular Data

Zhao, Yizhou, Li, Xiang, Song, Peter, Long, Qi, Su, Weijie

The rise of generative AI has enabled the production of high-fidelity synthetic tabular data across fields such as healthcare, finance, and public policy, raising growing concerns about data provenance and misuse. Watermarking offers a promising solution to address these concerns by ensuring the traceability of synthetic data, but existing methods face many limitations: they are computationally expensive due to reliance on large diffusion models, struggle with mixed discrete-continuous data, or lack robustness to post-modifications. To address them, we propose TAB-DRW, an efficient and robust post-editing watermarking scheme for generative tabular data. TAB-DRW embeds watermark signals in the frequency domain: it normalizes heterogeneous features via the Yeo-Johnson transformation and standardization, applies the discrete Fourier transform (DFT), and adjusts the imaginary parts of adaptively selected entries according to precomputed pseudorandom bits. To further enhance robustness and efficiency, we introduce a novel rank-based pseudorandom bit generation method that enables row-wise retrieval without incurring storage overhead. Experiments on five benchmark tabular datasets show that TAB-DRW achieves strong detectability and robustness against common post-processing attacks, while preserving high data fidelity and fully supporting mixed-type features.

machine learning, natural language, tabular data, (19 more...)

2511.216

Genre: Research Report > New Finding (0.92)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(3 more...)