Embedding-Enhanced Giza++: Improving Alignment in Low- and High- Resource Scenarios Using Embedding Space Geometry
Marchisio, Kelly, Xiong, Conghao, Koehn, Philipp
–arXiv.org Artificial Intelligence
A popular natural language processing task decades ago, word alignment has been dominated until recently by GIZA++, a statistical method based on the 30-year-old IBM models. New methods that outperform GIZA++ primarily rely on large machine translation models, massively multilingual language models, or supervision from GIZA++ alignments itself. We introduce Embedding-Enhanced GIZA++, and outperform GIZA++ without any of the aforementioned factors. Taking advantage of monolingual embedding spaces of source and target language only, we exceed GIZA++'s performance in every tested scenario for three languages pairs. In the lowest-resource setting, we outperform GIZA++ by 8.5, 10.9, and 12 AER for Ro-En, De-En, and En-Fr, respectively. We release our code at https://github.com/kellymarchisio/ee-giza.
arXiv.org Artificial Intelligence
Oct-10-2022
- Country:
- Oceania > Australia
- North America
- United States
- Maryland > Baltimore (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- Georgia > Fulton County
- Atlanta (0.04)
- Canada > British Columbia
- United States
- Europe
- Czechia > Prague (0.04)
- Italy > Tuscany
- Florence (0.05)
- Denmark > Capital Region
- Copenhagen (0.04)
- Asia
- China > Hong Kong (0.05)
- South Korea (0.04)
- Middle East > Qatar
- Japan
- Honshū > Kansai
- Osaka Prefecture > Osaka (0.04)
- Hokkaidō > Hokkaidō Prefecture
- Sapporo (0.04)
- Honshū > Kansai
- Africa > Middle East
- Egypt > Giza Governorate > Giza (1.00)
- Genre:
- Research Report (0.40)
- Technology: