AITopics

Country:

North America > United States (0.28)
Europe > United Kingdom > England (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Neural Information Processing SystemsFeb-17-2026, 05:41:00 GMT

LEACE: Perfect linear concept erasure in closed form

Concept erasure aims to remove specified features from a representation. It can improve fairness (e.g.

artificial intelligence, machine learning, natural language, (19 more...)

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > Dominican Republic (0.04)
(4 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Neural Information Processing SystemsFeb-15-2026, 16:30:20 GMT

Robust Concept Erasure via Kernelized Rate-Distortion Maximization

Distributed representations provide a vector space that captures meaningful relationships between data instances.

large language model, machine learning, natural language, (21 more...)

Country:

North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > United States > Colorado > Boulder County > Boulder (0.04)
(3 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)

Nguyen, Viet, Patel, Vishal M.

CGCE: Classifier-Guided Concept Erasure in Generative Models

arXiv.org Artificial IntelligenceNov-26-2025

Recent advancements in large-scale generative models have enabled the creation of high-quality images and videos, but have also raised significant safety concerns regarding the generation of unsafe content. To mitigate this, concept erasure methods have been developed to remove undesirable concepts from pre-trained models. However, existing methods remain vulnerable to adversarial attacks that can regenerate the erased content. Moreover, achieving robust erasure often degrades the model's generative quality for safe, unrelated concepts, creating a difficult trade-off between safety and performance. To address this challenge, we introduce Classifier-Guided Concept Erasure (CGCE), an efficient plug-and-play framework that provides robust concept erasure for diverse generative models without altering their original weights. CGCE uses a lightweight classifier operating on text embeddings to first detect and then refine prompts containing undesired concepts. This approach is highly scalable, allowing for multi-concept erasure by aggregating guidance from several classifiers. By modifying only unsafe embeddings at inference time, our method prevents harmful content generation while preserving the model's original quality on benign prompts. Extensive experiments show that CGCE achieves state-of-the-art robustness against a wide range of red-teaming attacks. Our approach also maintains high generative utility, demonstrating a superior balance between safety and performance. We showcase the versatility of CGCE through its successful application to various modern T2I and T2V models, establishing it as a practical and effective solution for safe generative AI.

classifier, machine learning, natural language, (19 more...)

2511.05865

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (0.34)
Government > Military (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Generation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.34)

arXiv.org Artificial IntelligenceNov-10-2025

When Are Concepts Erased From Diffusion Models?

Lu, Kevin, Kriplani, Nicky, Gandikota, Rohit, Pham, Minh, Bau, David, Hegde, Chinmay, Cohen, Niv

In concept erasure, a model is modified to selectively prevent it from generating a target concept. Despite the rapid development of new methods, it remains unclear how thoroughly these approaches remove the target concept from the model. We begin by proposing two conceptual models for the erasure mechanism in diffusion models: (i) interfering with the model's internal guidance processes, and (ii) reducing the unconditional likelihood of generating the target concept, potentially removing it entirely. To assess whether a concept has been truly erased from the model, we introduce a comprehensive suite of independent probing techniques: supplying visual context, modifying the diffusion trajectory, applying classifier guidance, and analyzing the model's alternative generations that emerge in place of the erased concept. Our results shed light on the value of exploring concept erasure robustness outside of adversarial text inputs, and emphasize the importance of comprehensive evaluations for erasure in diffusion models. Our code, data, and results are available at unerasing.baulab.info.

artificial intelligence, diffusion model, machine learning, (16 more...)

2505.17013

Country: Europe > Austria (0.28)

Genre: Research Report > New Finding (0.88)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

arXiv.org Artificial IntelligenceOct-28-2025

Semantic Surgery: Zero-Shot Concept Erasure in Diffusion Models

Xiong, Lexiang, Liu, Chengyu, Ye, Jingwen, Liu, Yan, Xu, Yuecong

Concept erasure in text-to-image diffusion models is crucial for mitigating harmful content, yet existing methods often compromise generative quality. We introduce Semantic Surgery, a novel training-free, zero-shot framework for concept erasure that operates directly on text embeddings before the diffusion process. It dynamically estimates the presence of target concepts in a prompt and performs a calibrated vector subtraction to neutralize their influence at the source, enhancing both erasure completeness and locality. The framework includes a Co-Occurrence Encoding module for robust multi-concept erasure and a visual feedback loop to address latent concept persistence. As a training-free method, Semantic Surgery adapts dynamically to each prompt, ensuring precise interventions. Extensive experiments on object, explicit content, artistic style, and multi-celebrity erasure tasks show our method significantly outperforms state-of-the-art approaches. We achieve superior completeness and robustness while preserving locality and image quality (e.g., 93.58 H-score in object erasure, reducing explicit content to just 1 instance, and 8.09 H_a in style erasure with no quality degradation). This robustness also allows our framework to function as a built-in threat detection system, offering a practical solution for safer text-to-image generation.

large language model, machine learning, natural language, (19 more...)

2510.22851

Country: Europe > Switzerland (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Overview (1.00)
Research Report > Promising Solution (0.65)

Industry: Information Technology > Security & Privacy (0.66)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceOct-21-2025

Beyond Fixed Anchors: Precisely Erasing Concepts with Sibling Exclusive Counterparts

Zhang, Tong, Zhang, Ru, Liu, Jianyi, Yang, Zhen, Liu, Gongshen

Existing concept erasure methods for text-to-image diffusion models commonly rely on fixed anchor strategies, which often lead to critical issues such as concept re-emergence and erosion. To address this, we conduct causal tracing to reveal the inherent sensitivity of erasure to anchor selection and define Sibling Exclusive Concepts as a superior class of anchors. Based on this insight, we propose \textbf{SELECT} (Sibling-Exclusive Evaluation for Contextual Targeting), a dynamic anchor selection framework designed to overcome the limitations of fixed anchors. Our framework introduces a novel two-stage evaluation mechanism that automatically discovers optimal anchors for precise erasure while identifying critical boundary anchors to preserve related concepts. Extensive evaluations demonstrate that SELECT, as a universal anchor solution, not only efficiently adapts to multiple erasure frameworks but also consistently outperforms existing baselines across key performance metrics, averaging only 4 seconds for anchor mining of a single concept.

artificial intelligence, machine learning, natural language, (17 more...)

2510.16342

Country:

North America > United States (1.00)
Asia > Middle East > Republic of Türkiye (0.28)

Genre: Research Report > New Finding (0.67)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)
Government > Regional Government > North America Government > United States Government (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)

Neural Information Processing SystemsOct-9-2025, 00:31:57 GMT

Robust Concept Erasure via Kernelized Rate-Distortion Maximization

Distributed representations provide a vector space that captures meaningful relationships between data instances.

large language model, machine learning, natural language, (21 more...)

Country:

North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > United States > Colorado > Boulder County > Boulder (0.04)
(3 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)

arXiv.org Artificial IntelligenceOct-7-2025

Revoking Amnesia: RL-based Trajectory Optimization to Resurrect Erased Concepts in Diffusion Models

Gao, Daiheng, Jiang, Nanxiang, Zhang, Andi, Lu, Shilin, Tang, Yufei, Zhou, Wenbo, Zhang, Weiming, Fan, Zhaoxin

Concept erasure techniques have been widely deployed in T2I diffusion models to prevent inappropriate content generation for safety and copyright considerations. However, as models evolve to next-generation architectures like Flux, established erasure methods (\textit{e.g.}, ESD, UCE, AC) exhibit degraded effectiveness, raising questions about their true mechanisms. Through systematic analysis, we reveal that concept erasure creates only an illusion of ``amnesia": rather than genuine forgetting, these methods bias sampling trajectories away from target concepts, making the erasure fundamentally reversible. This insight motivates the need to distinguish superficial safety from genuine concept removal. In this work, we propose \textbf{RevAm} (\underline{Rev}oking \underline{Am}nesia), an RL-based trajectory optimization framework that resurrects erased concepts by dynamically steering the denoising process without modifying model weights. By adapting Group Relative Policy Optimization (GRPO) to diffusion models, RevAm explores diverse recovery trajectories through trajectory-level rewards, overcoming local optima that limit existing methods. Extensive experiments demonstrate that RevAm achieves superior concept resurrection fidelity while reducing computational time by 10$\times$, exposing critical vulnerabilities in current safety mechanisms and underscoring the need for more robust erasure techniques beyond trajectory manipulation.

diffusion model, machine learning, natural language, (17 more...)

2510.03302

Country:

North America > United States (0.46)
Asia (0.46)
Europe > Germany (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Media > Music (1.00)
Leisure & Entertainment > Sports (1.00)
Information Technology (0.93)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

arXiv.org Artificial IntelligenceSep-25-2025

The Medium Is Not the Message: Deconfounding Document Embeddings via Linear Concept Erasure

Fan, Yu, Tian, Yang, Ravfogel, Shauli, Sachan, Mrinmaya, Ash, Elliott, Hoyle, Alexander

Embedding-based similarity metrics between text sequences can be influenced not just by the content dimensions we most care about, but can also be biased by spurious attributes like the text's source or language. These document confounders cause problems for many applications, but especially those that need to pool texts from different corpora. This paper shows that a debiasing algorithm that removes information about observed confounders from the encoder representations substantially reduces these biases at a minimal computational cost. Document similarity and clustering metrics improve across every embedding variant and task we evaluate -- often dramatically. Interestingly, performance on out-of-distribution benchmarks is not impacted, indicating that the embeddings are not otherwise degraded.

computational linguistic, machine learning, natural language, (16 more...)

2507.01234

Country:

Europe (1.00)
Asia (1.00)
North America > United States > Minnesota (0.28)

Genre: Research Report > New Finding (0.93)

Industry:

Law > Government & the Courts (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Banking & Finance (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)