Consistency Guided Knowledge Retrieval and Denoising in LLMs for Zero-shot Document-level Relation Triplet Extraction
Sun, Qi, Huang, Kun, Yang, Xiaocui, Tong, Rong, Zhang, Kun, Poria, Soujanya
–arXiv.org Artificial Intelligence
Document-level Relation Triplet Extraction (DocRTE) is a fundamental task in information systems that aims to simultaneously extract entities with semantic relations from a document. Existing methods heavily rely on a substantial amount of fully labeled data. However, collecting and annotating data for newly emerging relations is time-consuming and labor-intensive. Recent advanced Large Language Models (LLMs), such as ChatGPT and LLaMA, exhibit impressive long-text generation capabilities, inspiring us to explore an alternative approach for obtaining auto-labeled documents with new relations. In this paper, we propose a Zero-shot Document-level Relation Triplet Extraction (ZeroDocRTE) framework, which generates labeled data by retrieval and denoising knowledge from LLMs, called GenRDK. Specifically, we propose a chain-of-retrieval prompt to guide ChatGPT to generate labeled long-text data step by step. To improve the quality of synthetic data, we propose a denoising strategy based on the consistency of cross-document knowledge. Leveraging our denoised synthetic data, we proceed to fine-tune the LLaMA2-13B-Chat for extracting document-level relation triplets. We perform experiments for both zero-shot document-level relation and triplet extraction on two public datasets. The experimental results illustrate that our GenRDK framework outperforms strong baselines.
arXiv.org Artificial Intelligence
Jan-24-2024
- Country:
- Asia
- China > Jiangsu Province
- Nanjing (0.04)
- Middle East > Jordan (0.04)
- Singapore (0.04)
- China > Jiangsu Province
- Atlantic Ocean > North Atlantic Ocean
- Baltic Sea (0.04)
- Europe > Finland (0.04)
- North America
- Canada > Ontario
- Toronto (0.04)
- United States > Illinois
- Cook County > Chicago (0.04)
- Canada > Ontario
- Asia
- Genre:
- Research Report (0.50)
- Industry:
- Leisure & Entertainment (1.00)
- Media > Film (1.00)
- Technology: