Participatory Research for Low-resourced Machine Translation: A Case Study in African Languages

Nekoto, Wilhelmina, Marivate, Vukosi, Matsila, Tshinondiwa, Fasubaa, Timi, Kolawole, Tajudeen, Fagbohungbe, Taiwo, Akinola, Solomon Oluwole, Muhammad, Shamsuddeen Hassan, Kabongo, Salomon, Osei, Salomey, Freshia, Sackey, Niyongabo, Rubungo Andre, Macharm, Ricky, Ogayo, Perez, Ahia, Orevaoghene, Meressa, Musie, Adeyemi, Mofe, Mokgesi-Selinga, Masabata, Okegbemi, Lawrence, Martinus, Laura Jane, Tajudeen, Kolawole, Degila, Kevin, Ogueji, Kelechi, Siminyu, Kathleen, Kreutzer, Julia, Webster, Jason, Ali, Jamiil Toure, Abbott, Jade, Orife, Iroro, Ezeani, Ignatius, Dangana, Idris Abdulkabir, Kamper, Herman, Elsahar, Hady, Duru, Goodness, Kioko, Ghollah, Murhabazi, Espoir, van Biljon, Elan, Whitenack, Daniel, Onyefuluchi, Christopher, Emezue, Chris, Dossou, Bonaventure, Sibanda, Blessing, Bassey, Blessing Itoro, Olabiyi, Ayodele, Ramkilowan, Arshath, Öktem, Alp, Akinfaderin, Adewale, Bashir, Abdallah

Nov-6-2020–arXiv.org Artificial Intelligence

Research in NLP lacks geographic diversity, and the question of how NLP can be scaled to low-resourced languages has not yet been adequately solved. "Low-resourced"-ness is a complex problem going beyond data availability and reflects systemic problems in society. In this paper, we focus on the task of Machine Translation (MT), that plays a crucial role for information accessibility and communication worldwide. Despite immense improvements in MT over the past decade, MT is centered around a few high-resourced languages. As MT researchers cannot solve the problem of low-resourcedness alone, we propose participatory research as a means to involve all necessary agents required in the MT development process. We demonstrate the feasibility and scalability of participatory research with a case study on MT for African languages. Its implementation leads to a collection of novel translation datasets, MT benchmarks for over 30 languages, with human evaluations for a third of them, and enables participants without formal training to make a unique scientific contribution. Benchmarks, models, data, code, and evaluation results are released under https://github.com/masakhane-io/masakhane-mt.

computational linguistic, proceedings, translation, (14 more...)

arXiv.org Artificial Intelligence

Nov-6-2020

arXiv.org PDF

Add feedback

Country:
- North America
  - United States
    - Texas > Travis County
      - Austin (0.04)
    - Pennsylvania > Philadelphia County
      - Philadelphia (0.04)
    - Minnesota > Hennepin County
      - Minneapolis (0.04)
    - Louisiana > Orleans Parish
      - New Orleans (0.04)
    - California
      - San Diego County > San Diego (0.04)
      - Los Angeles County > Los Angeles (0.04)
  - Canada > Quebec
    - Montreal (0.04)
- Europe
  - Spain (0.04)
  - Slovenia (0.04)
  - Belgium (0.04)
  - Germany
    - Saarland (0.04)
    - Berlin (0.04)
    - Bavaria > Upper Bavaria
      - Munich (0.04)
  - France > Provence-Alpes-Côte d'Azur
    - Bouches-du-Rhône > Marseille (0.04)
  - Sweden > Uppsala County
    - Uppsala (0.04)
  - Bulgaria > Sofia City Province
    - Sofia (0.04)
  - Greece > Attica
    - Athens (0.04)
  - Portugal > Lisbon
    - Lisbon (0.04)
  - Italy
    - Tuscany > Florence (0.05)
    - Trentino-Alto Adige/Südtirol > Trentino Province
      - Trento (0.04)
- Asia
  - Indonesia > Bali (0.04)
  - Japan > Kyūshū & Okinawa
    - Kyūshū > Miyazaki Prefecture > Miyazaki (0.04)
  - India > Telangana
    - Hyderabad (0.04)
  - China
    - Hong Kong (0.04)
    - Beijing > Beijing (0.04)
- Africa
  - Sub-Saharan Africa (0.04)
  - Niger (0.04)
  - Namibia (0.04)
  - South Africa > Gauteng
    - Pretoria (0.04)
    - Johannesburg (0.04)
  - Nigeria
    - Ondo State > Akure (0.04)
    - Enugu State > Enugu (0.04)

Genre:
- Research Report (0.82)

Industry:
- Education > Educational Setting > Online (0.46)

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found