Ontology-Free General-Domain Knowledge Graph-to-Text Generation Dataset Synthesis using Large Language Model
Kim, Daehee, Kang, Deokhyung, Ryu, Sangwon, Lee, Gary Geunbae
–arXiv.org Artificial Intelligence
Knowledge Graph-to-Text (G2T) generation involves verbalizing structured knowledge graphs into natural language text. Recent advancements in Pretrained Language Models (PLMs) have improved G2T performance, but their effectiveness depends on datasets with precise graph-text alignment. However, the scarcity of high-quality, general-domain G2T generation datasets restricts progress in the general-domain G2T generation research. To address this issue, we introduce Wikipedia Ontology-Free Graph-text dataset (WikiOFGraph), a new large-scale G2T dataset generated using a novel method that leverages Large Language Model (LLM) and Data-QuestEval. Our new dataset, which contains 5.85M general-domain graph-text pairs, offers high graph-text consistency without relying on external ontologies. Experimental results demonstrate that PLM fine-tuned on WikiOFGraph outperforms those trained on other datasets across various evaluation metrics. Our method proves to be a scalable and effective solution for generating high-quality G2T data, significantly advancing the field of G2T generation.
arXiv.org Artificial Intelligence
Sep-11-2024
- Country:
- Africa > Ethiopia
- Addis Ababa > Addis Ababa (0.04)
- Asia
- India > Karnataka
- Bengaluru (0.04)
- Japan > Kyūshū & Okinawa
- Kyūshū > Miyazaki Prefecture > Miyazaki (0.04)
- Middle East
- Republic of Türkiye > Istanbul Province
- Istanbul (0.04)
- UAE > Abu Dhabi Emirate
- Abu Dhabi (0.04)
- Republic of Türkiye > Istanbul Province
- Singapore (0.04)
- South Korea (0.04)
- Thailand > Bangkok
- Bangkok (0.04)
- India > Karnataka
- Europe
- United Kingdom > Scotland (0.04)
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Croatia > Dubrovnik-Neretva County
- Dubrovnik (0.04)
- Middle East > Republic of Türkiye
- Istanbul Province > Istanbul (0.04)
- Spain > Catalonia
- Barcelona Province > Barcelona (0.04)
- Portugal > Lisbon
- Lisbon (0.04)
- Slovakia (0.04)
- Denmark > Capital Region
- Copenhagen (0.04)
- Italy > Tuscany
- Florence (0.04)
- Austria > Vorarlberg (0.04)
- North America
- Dominican Republic (0.04)
- Mexico > Mexico City
- Mexico City (0.04)
- United States
- New Mexico > Doña Ana County (0.04)
- District of Columbia > Washington (0.04)
- Michigan (0.04)
- California > Los Angeles County
- Los Angeles (0.04)
- Massachusetts > Middlesex County
- Cambridge (0.04)
- Pennsylvania > Philadelphia County
- Philadelphia (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- New York
- New York County > New York City (0.04)
- Richmond County > New York City (0.04)
- Texas
- Jones County (0.04)
- Taylor County > Abilene (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- Oceania > Australia
- South America > Chile
- Africa > Ethiopia
- Genre:
- Research Report > New Finding (0.88)
- Industry:
- Leisure & Entertainment (0.46)
- Transportation > Ground (0.46)
- Technology: