Retrieval-based Disentanglement with Distant Supervision

Zhou, Jiawei, Li, Xiaoguang, Shang, Lifeng, Jiang, Xin, Liu, Qun, Chen, Lei

Dec-15-2022–arXiv.org Artificial Intelligence

Disentangled representation learning remains challenging as ground truth factors of variation do not naturally exist. To address this, we present Vocabulary Disentanglement Retrieval~(VDR), a simple yet effective retrieval-based disentanglement framework that leverages nature language as distant supervision. Our approach is built upon the widely-used bi-encoder architecture with disentanglement heads and is trained on data-text pairs that are readily available on the web or in existing datasets. This makes our approach task- and modality-agnostic with potential for a wide range of downstream applications. We conduct experiments on 16 datasets in both text-to-text and cross-modal scenarios and evaluate VDR in a zero-shot setting. With the incorporation of disentanglement heads and a minor increase in parameters, VDR achieves significant improvements over the base retriever it is built upon, with a 9% higher on NDCG@10 scores in zero-shot text-to-text retrieval and an average of 13% higher recall in cross-modal retrieval. In comparison to other baselines, VDR outperforms them in most tasks, while also improving explainability and efficiency.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

Dec-15-2022

arXiv.org PDF

Add feedback

Country:
- Asia > China > Hong Kong (0.04)

Genre:
- Research Report (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (1.00)
  - Natural Language > Large Language Model (0.55)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found