Feature Detection and Attenuation in Embeddings

Wang, Yuwei, Zheng, Yan, Peng, Yanqing, Zhang, Wei, Li, Feifei

arXiv.org Machine Learning 

Embedding is one of the fundamental building blocks for data analysis tasks. Although most embedding schemes are designed to be domain-specific, they have been recently extended to represent various other research domains. However, there are relatively few discussions on analyzing these generated embeddings, and removing undesired features from the embedding. In this paper, we first propose an innovative embedding analyzing method that quantitatively measures the features in the embedding data. We then propose an unsupervised method to remove or alleviate undesired features in the embedding by applying Domain Adversarial Network (DAN). Our empirical results demonstrate that the proposed algorithm has good performance on both industry and natural language processing benchmark datasets.