Data-adaptive Transfer Learning for Translation: A Case Study in Haitian and Jamaican
Robinson, Nathaniel R., Hogan, Cameron J., Fulda, Nancy, Mortensen, David R.
–arXiv.org Artificial Intelligence
Multilingual transfer techniques often improve low-resource machine translation (MT). Many of these techniques are applied without considering data characteristics. We show in the context of Haitian-to-English translation that transfer effectiveness is correlated with amount of training data and relationships between knowledge-sharing languages. Our experiments suggest that for some languages beyond a threshold of authentic data, back-translation augmentation methods are counterproductive, while cross-lingual transfer from a sufficiently related language is preferred. We complement this finding by contributing a rule-based French-Haitian orthographic and syntactic engine and a novel method for phonological embedding. When used with multilingual techniques, orthographic transformation makes statistically significant improvements over conventional methods. And in very low-resource Jamaican MT, code-switching with a transfer language for orthographic resemblance yields a 6.63 BLEU point advantage.
arXiv.org Artificial Intelligence
Sep-13-2022
- Country:
- South America
- North America
- Haiti (0.68)
- The Bahamas (0.14)
- Mexico (0.04)
- Dominican Republic (0.04)
- United States
- Texas (0.04)
- District of Columbia > Washington (0.04)
- Utah > Utah County
- Provo (0.04)
- Pennsylvania > Allegheny County
- Pittsburgh (0.14)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- Canada > Ontario
- Toronto (0.04)
- Europe
- Germany > Berlin (0.04)
- Sweden > Östergötland County
- Linköping (0.04)
- Italy > Tuscany
- Florence (0.04)
- Iceland > Capital Region
- Reykjavik (0.04)
- France > Île-de-France
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- Asia > Japan
- Honshū > Kansai > Osaka Prefecture > Osaka (0.04)
- Genre:
- Research Report > Experimental Study (0.88)
- Industry:
- Government > Regional Government (0.46)
- Technology: