Towards Generalized Open Information Extraction

Yu, Bowen, Zhang, Zhenyu, Li, Jingyang, Yu, Haiyang, Liu, Tingwen, Sun, Jian, Li, Yongbin, Wang, Bin

Nov-29-2022–arXiv.org Artificial Intelligence

Open Information Extraction (OpenIE) facilitates the open-domain discovery of textual facts. However, the prevailing solutions evaluate OpenIE models on in-domain test sets aside from the training corpus, which certainly violates the initial task principle of domain-independence. In this paper, we propose to advance OpenIE towards a more realistic scenario: generalizing over unseen target domains with different data distributions from the source training domains, termed Generalized OpenIE. For this purpose, we first introduce GLOBE, a large-scale human-annotated multi-domain OpenIE benchmark, to examine the robustness of recent OpenIE models to domain shifts, and the relative performance degradation of up to 70% implies the challenges of generalized OpenIE. Then, we propose DragonIE, which explores a minimalist graph expression of textual fact: directed acyclic graph, to improve the OpenIE generalization. Extensive experiments demonstrate that DragonIE beats the previous methods in both in-domain and out-of-domain settings by as much as 6.0% in F1 score absolutely, but there is still ample room for improvement.

data mining, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

Nov-29-2022

arXiv.org PDF

Add feedback

Country:
- South America > Brazil
  - Rio de Janeiro > Rio de Janeiro (0.04)
- Oceania > Australia
  - Victoria > Melbourne (0.04)
- North America
  - Canada > British Columbia (0.04)
  - Dominican Republic (0.04)
  - United States
    - Texas > Travis County
      - Austin (0.04)
    - New York > New York County
      - New York City (0.04)
    - New Mexico > Santa Fe County
      - Santa Fe (0.04)
    - Minnesota > Hennepin County
      - Minneapolis (0.14)
    - Louisiana > Orleans Parish
      - New Orleans (0.04)
    - Georgia > Fulton County
      - Atlanta (0.04)
- Europe
  - Denmark (0.04)
  - Austria (0.04)
  - United Kingdom > Scotland
    - City of Edinburgh > Edinburgh (0.04)
- Asia > China
  - Hong Kong (0.04)
  - Beijing > Beijing (0.04)

Genre:
- Research Report (0.82)

Technology:
- Information Technology
  - Data Science > Data Mining
    - Text Mining (0.62)
  - Artificial Intelligence
    - Machine Learning (1.00)
    - Natural Language > Information Extraction (0.85)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found