Embedding-Enhanced Giza++: Improving Alignment in Low- and High- Resource Scenarios Using Embedding Space Geometry

Marchisio, Kelly, Xiong, Conghao, Koehn, Philipp

Oct-10-2022–arXiv.org Artificial Intelligence

A popular natural language processing task decades ago, word alignment has been dominated until recently by GIZA++, a statistical method based on the 30-year-old IBM models. New methods that outperform GIZA++ primarily rely on large machine translation models, massively multilingual language models, or supervision from GIZA++ alignments itself. We introduce Embedding-Enhanced GIZA++, and outperform GIZA++ without any of the aforementioned factors. Taking advantage of monolingual embedding spaces of source and target language only, we exceed GIZA++'s performance in every tested scenario for three languages pairs. In the lowest-resource setting, we outperform GIZA++ by 8.5, 10.9, and 12 AER for Ro-En, De-En, and En-Fr, respectively. We release our code at https://github.com/kellymarchisio/ee-giza.

artificial intelligence, computational linguistic, natural language, (14 more...)

arXiv.org Artificial Intelligence

Oct-10-2022

arXiv.org PDF

Add feedback

Country:
- Oceania > Australia
  - Victoria > Melbourne (0.04)
- North America
  - United States
    - Maryland > Baltimore (0.04)
    - Minnesota > Hennepin County
      - Minneapolis (0.14)
    - Louisiana > Orleans Parish
      - New Orleans (0.04)
    - Georgia > Fulton County
      - Atlanta (0.04)
  - Canada > British Columbia
    - Metro Vancouver Regional District > Vancouver (0.04)
- Europe
  - Czechia > Prague (0.04)
  - Italy > Tuscany
    - Florence (0.05)
  - Denmark > Capital Region
    - Copenhagen (0.04)
- Asia
  - China > Hong Kong (0.05)
  - South Korea (0.04)
  - Middle East > Qatar
    - Ad-Dawhah > Doha (0.04)
  - Japan
    - Honshū > Kansai
      - Osaka Prefecture > Osaka (0.04)
    - Hokkaidō > Hokkaidō Prefecture
      - Sapporo (0.04)
- Africa > Middle East
  - Egypt > Giza Governorate > Giza (1.00)

Genre:
- Research Report (0.40)

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found