ViWikiFC: Fact-Checking for Vietnamese Wikipedia-Based Textual Knowledge Source
Le, Hung Tuan, To, Long Truong, Nguyen, Manh Trong, Van Nguyen, Kiet
–arXiv.org Artificial Intelligence
Fact-checking is essential due to the explosion of misinformation in the media ecosystem. Although false information exists in every language and country, most research to solve the problem mainly concentrated on huge communities like English and Chinese. Low-resource languages like Vietnamese are necessary to explore corpora and models for fact verification. To bridge this gap, we construct ViWikiFC, the first manual annotated open-domain corpus for Vietnamese Wikipedia Fact Checking more than 20K claims generated by converting evidence sentences extracted from Wikipedia articles. We analyze our corpus through many linguistic aspects, from the new dependency rate, the new n-gram rate, and the new word rate. We conducted various experiments for Vietnamese fact-checking, including evidence retrieval and verdict prediction. BM25 and InfoXLM (Large) achieved the best results in two tasks, with BM25 achieving an accuracy of 88.30% for SUPPORTS, 86.93% for REFUTES, and only 56.67% for the NEI label in the evidence retrieval task, InfoXLM (Large) achieved an F1 score of 86.51%. Furthermore, we also conducted a pipeline approach, which only achieved a strict accuracy of 67.00% when using InfoXLM (Large) and BM25. These results demonstrate that our dataset is challenging for the Vietnamese language model in fact-checking tasks.
arXiv.org Artificial Intelligence
May-13-2024
- Country:
- South America > Chile
- Los Ríos Region > Valdivia Province > Valdivia (0.04)
- North America
- Dominican Republic (0.04)
- United States
- Maryland > Baltimore (0.04)
- Washington > King County
- Seattle (0.14)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- Canada
- Ontario > Toronto (0.04)
- British Columbia > Metro Vancouver Regional District
- Vancouver (0.04)
- Indian Ocean
- Bay of Bengal (0.04)
- Arabian Sea (0.04)
- Europe
- Germany (0.04)
- Russia (0.04)
- Hungary (0.04)
- Bulgaria (0.04)
- Finland (0.04)
- Romania (0.04)
- Iceland > Capital Region
- Reykjavik (0.04)
- Italy > Tuscany
- Florence (0.04)
- Sweden > Östergötland County
- Linköping (0.04)
- Croatia > Dubrovnik-Neretva County
- Dubrovnik (0.04)
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- Switzerland > Geneva
- Geneva (0.04)
- Asia
- Singapore (0.05)
- Russia (0.04)
- Middle East > Iraq (0.04)
- India (0.04)
- China > Hong Kong (0.04)
- Vietnam
- Nghệ An Province (0.04)
- Hồ Chí Minh City > Hồ Chí Minh City (0.04)
- South America > Chile
- Genre:
- Research Report > New Finding (0.34)
- Industry:
- Media > News (0.48)
- Health & Medicine > Therapeutic Area
- Infections and Infectious Diseases (0.46)
- Immunology (0.46)
- Technology: