Mintaka: A Complex, Natural, and Multilingual Dataset for End-to-End Question Answering
Sen, Priyanka, Aji, Alham Fikri, Saffari, Amir
–arXiv.org Artificial Intelligence
We introduce Mintaka, a complex, natural, and multilingual dataset designed for experimenting with end-to-end question-answering models. Mintaka is composed of 20,000 question-answer pairs collected in English, annotated with Wikidata entities, and translated into Arabic, French, German, Hindi, Italian, Japanese, Portuguese, and Spanish for a total of 180,000 samples. Mintaka includes 8 types of complex questions, including superlative, intersection, and multi-hop questions, which were naturally elicited from crowd workers. We run baselines over Mintaka, the best of which achieves 38% hits@1 in English and 31% hits@1 multilingually, showing that existing models have room for improvement. We release Mintaka at https://github.com/amazon-research/mintaka.
arXiv.org Artificial Intelligence
Oct-4-2022
- Country:
- Africa
- Ethiopia > Addis Ababa
- Addis Ababa (0.04)
- Middle East > Egypt (0.04)
- Ethiopia > Addis Ababa
- Asia
- India (0.04)
- Japan (0.04)
- Middle East
- Iraq > Baghdad Governorate
- Baghdad (0.04)
- Saudi Arabia (0.04)
- Iraq > Baghdad Governorate
- Europe
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- France (0.04)
- Germany (0.04)
- Italy (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- Greater London > London
- Wimbledon (0.04)
- Belgium > Brussels-Capital Region
- North America
- Dominican Republic (0.04)
- Mexico (0.04)
- United States
- Alaska (0.04)
- California > Los Angeles County
- Los Angeles (0.04)
- Maryland > Baltimore (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.04)
- Missouri (0.04)
- Texas > Travis County
- Austin (0.04)
- South America
- Africa
- Genre:
- Questionnaire & Opinion Survey (0.68)
- Research Report (0.82)
- Industry:
- Government > Regional Government
- Leisure & Entertainment > Sports
- Football (0.93)
- Media > Film (0.93)
- Technology: