Multilingual Non-Factoid Question Answering with Silver Answers

Mishra, Ritwik, Vennam, Sreeram, Shah, Rajiv Ratn, Kumaraguru, Ponnurangam

Aug-20-2024–arXiv.org Artificial Intelligence

Most existing Question Answering Datasets (QuADs) primarily focus on factoid-based short-context Question Answering (QA) in high-resource languages. However, the scope of such datasets for low-resource languages remains limited, with only a few works centered on factoid-based QuADs and none on non-factoid QuADs. Therefore, this work presents MuNfQuAD, a multilingual QuAD with non-factoid questions. It utilizes interrogative sub-headings from BBC news articles as questions and the corresponding paragraphs as silver answers. The dataset comprises over 370K QA pairs across 38 languages, encompassing several low-resource languages, and stands as the largest multilingual QA dataset to date. Based on the manual annotations of 790 QA-pairs from MuNfQuAD (golden set), we observe that 98\% of questions can be answered using their corresponding silver answer. Our fine-tuned Answer Paragraph Selection (APS) model outperforms the baselines. The APS model attained an accuracy of 80\% and 72\%, as well as a macro F1 of 72\% and 66\%, on the MuNfQuAD testset and the golden set, respectively. Furthermore, the APS model effectively generalizes certain a language within the golden set, even after being fine-tuned on silver labels.

aps model, paragraph, proceedings, (15 more...)

arXiv.org Artificial Intelligence

Aug-20-2024

arXiv.org PDF

Add feedback

Country:
- South America (0.04)
- Africa > Middle East (0.04)
- North America
  - Central America (0.04)
  - United States
    - New York > New York County
      - New York City (0.04)
    - Minnesota > Hennepin County
      - Minneapolis (0.14)
- Europe
  - United Kingdom > Scotland (0.04)
  - Ukraine (0.04)
  - Russia (0.04)
  - Middle East (0.04)
  - Greece > Ionian Islands
    - Corfu (0.04)
  - France > Occitanie
    - Haute-Garonne > Toulouse (0.04)
- Asia
  - India (0.05)
  - Nepal (0.04)
  - China (0.04)
  - Russia (0.04)
  - Middle East > Qatar (0.04)
  - Bangladesh (0.04)
  - Japan > Honshū
    - Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)

Genre:
- Research Report > New Finding (0.46)

Industry:
- Government > Military (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language
    - Question Answering (1.00)
    - Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found