WebBrain: Learning to Generate Factually Correct Articles for Queries by Grounding on Large Web Corpus
Qian, Hongjing, Zhu, Yutao, Dou, Zhicheng, Gu, Haoqi, Zhang, Xinyu, Liu, Zheng, Lai, Ruofei, Cao, Zhao, Nie, Jian-Yun, Wen, Ji-Rong
–arXiv.org Artificial Intelligence
In this paper, we introduce a new NLP task -- generating short factual articles with references for queries by mining supporting evidence from the Web. In this task, called WebBrain, the ultimate goal is to generate a fluent, informative, and factually-correct short article (e.g., a Wikipedia article) for a factual query unseen in Wikipedia. To enable experiments on WebBrain, we construct a large-scale dataset WebBrain-Raw by extracting English Wikipedia articles and their crawlable Wikipedia references. WebBrain-Raw is ten times larger than the previous biggest peer dataset, which can greatly benefit the research community. From WebBrain-Raw, we construct two task-specific datasets: WebBrain-R and WebBrain-G, which are used to train in-domain retriever and generator, respectively. Besides, we empirically analyze the performances of the current state-of-the-art NLP techniques on WebBrain and introduce a new framework ReGen, which enhances the generation factualness by improved evidence retrieval and task-specific pre-training for generation. Experiment results show that ReGen outperforms all baselines in both automatic and human evaluations.
arXiv.org Artificial Intelligence
Apr-9-2023
- Country:
- South America > Paraguay
- Oceania > Australia
- Western Australia > Perth (0.04)
- North America
- Dominican Republic (0.04)
- United States
- Michigan > Washtenaw County
- Ann Arbor (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- Massachusetts > Suffolk County
- Boston (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- Pennsylvania > Philadelphia County
- Philadelphia (0.04)
- California > Los Angeles County
- Long Beach (0.04)
- Washington > King County
- Seattle (0.14)
- Alaska > Anchorage Municipality
- Anchorage (0.04)
- New York > New York County
- New York City (0.04)
- Michigan > Washtenaw County
- Canada
- Quebec > Montreal (0.04)
- British Columbia > Metro Vancouver Regional District
- Vancouver (0.14)
- Europe
- Austria (0.04)
- United Kingdom > England
- Tyne and Wear > Newcastle (0.04)
- Spain > Catalonia
- Barcelona Province > Barcelona (0.04)
- Italy > Tuscany
- Florence (0.04)
- France > Provence-Alpes-Côte d'Azur
- Bouches-du-Rhône > Marseille (0.04)
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- Asia
- Japan > Kyūshū & Okinawa
- Kyūshū > Miyazaki Prefecture > Miyazaki (0.04)
- China > Beijing
- Beijing (0.04)
- Japan > Kyūshū & Okinawa
- Africa
- Tanzania (0.28)
- Democratic Republic of the Congo (0.14)
- East Africa (0.04)
- Sub-Saharan Africa (0.04)
- South Africa (0.04)
- Uganda > Central Region
- Kampala (0.04)
- Kenya > Nairobi City County
- Nairobi (0.04)
- Ethiopia > Addis Ababa
- Addis Ababa (0.04)
- Genre:
- Research Report > New Finding (1.00)
- Industry:
- Law (1.00)
- Health & Medicine (0.67)
- Government > Regional Government
- Africa Government (1.00)
- Education
- Educational Setting > Higher Education (1.00)
- Curriculum > Subject-Specific Education (0.68)
- Technology: