Language Tokens: A Frustratingly Simple Approach Improves Zero-Shot Performance of Multilingual Translation

ElNokrashy, Muhammad, Hendy, Amr, Maher, Mohamed, Afify, Mohamed, Awadalla, Hany Hassan

Aug-11-2022–arXiv.org Artificial Intelligence

Neural machine translation (NMT) has witnessed significant advances since the introduction of the transformer model (Vaswani et al., 2017). This model has shown impressive performance for bilingual translation commonly from and to English (Hassan et al., 2018). It has also been shown that the proposed model could be easily extended to multiple language pairs (Aharoni, Johnson, & Firat, 2019; Fan et al., 2020; Johnson et al., 2017; X. Wang, Tsvetkov, & Neubig, 2020), to and/or from English, by simple modifications to the basic architecture. This holds promise for improved performance for low-resource pairs through transfer learning, as well as better training and deployment costs per language pair. This setting is referred to as multilingual neural machine translation (MNMT). The mainstream method of training MNMT is to introduce an additional input tag at the encoder to indicate the target language, while the decoder uses the usual begin-of-sentence (BOS) token. This simple modification to the bilingual architecture is shown to work well up to hundreds of language pairs (Fan et al., 2020; Tran et al., 2021), given a corresponding increase in the number of parameters to handle the increased training data. Despite the emergence of modified architectures which add language-specific parameters, like language specific subnetworks (LASS) (Lin, Wu, Wang, & Li, 2021), and adapters (Bapna & Firat, 2019), the basic architecture remains the most effective choice for deploying large scale production systems.

machine translation, retrieved, translation, (15 more...)

arXiv.org Artificial Intelligence

Aug-11-2022

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - District of Columbia > Washington (0.04)
  - Washington > King County
    - Seattle (0.04)
  - Minnesota > Hennepin County
    - Minneapolis (0.14)
- Europe
  - Czechia > Prague (0.04)
  - Spain
    - Valencian Community > Valencia Province
      - Valencia (0.04)
    - Catalonia > Barcelona Province
      - Barcelona (0.04)
  - Middle East > Republic of Türkiye
    - Istanbul Province > Istanbul (0.04)
  - Belgium > Brussels-Capital Region
    - Brussels (0.04)
- Asia
  - China > Hong Kong (0.04)
  - Middle East > Republic of Türkiye
    - Istanbul Province > Istanbul (0.04)
- Africa > Middle East
  - Egypt > Cairo Governorate > Cairo (0.04)

Genre:
- Research Report > Experimental Study (0.68)

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found