Deep Concept Removal

Klochkov, Yegor, Ton, Jean-Francois, Guo, Ruocheng, Liu, Yang, Li, Hang

Oct-9-2023–arXiv.org Artificial Intelligence

We address the problem of concept removal in deep neural networks, aiming to learn representations that do not encode certain specified concepts (e.g., gender etc.) We propose a novel method based on adversarial linear classifiers trained on a concept dataset, which helps to remove the targeted attribute while maintaining model performance. Our approach Deep Concept Removal incorporates adversarial probing classifiers at various layers of the network, effectively addressing concept entanglement and improving out-of-distribution generalization. We also introduce an implicit gradient-based technique to tackle the challenges associated with adversarial training using linear classifiers. We evaluate the ability to remove a concept on a set of popular distributionally robust optimization (DRO) benchmarks with spurious correlations, as well as out-of-distribution (OOD) generalization tasks.

artificial intelligence, machine learning, representation, (16 more...)

arXiv.org Artificial Intelligence

Oct-9-2023

arXiv.org PDF

Add feedback

Country:
- Europe > United Kingdom (0.14)

Genre:
- Research Report
  - New Finding (0.67)
  - Experimental Study (0.46)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Statistical Learning (1.00)
  - Neural Networks > Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found