Quantized Wasserstein Procrustes Alignment of Word Embedding Spaces

Aboagye, Prince O, Zheng, Yan, Yeh, Michael, Wang, Junpeng, Zhuang, Zhongfang, Chen, Huiyuan, Wang, Liang, Zhang, Wei, Phillips, Jeff

Dec-5-2022–arXiv.org Artificial Intelligence

In natural language processing (NLP), the problem of aligning monolingual embedding spaces to induce a shared cross-lingual vector space has been shown not only to be useful in a variety of tasks such as bilingual lexicon induction (BLI) (Mikolov et al., 2013; Barone, 2016; Artetxe et al., 2017; Aboagye et al., 2022), machine translation (Artetxe et al., 2018b), cross-lingual information retrieval (Vulić & Moens, 2015), but it plays a crucial role in facilitating the cross-lingual transfer of language technologies from high resource languages to low resource languages. Cross-lingual word embeddings (CLWEs) represent words from two or more languages in a shared cross-lingual vector space in which words with similar meanings obtain similar vectors regardless of their language. There has been a flurry of work dominated by the so-called projection-based CLWE models (Mikolov et al., 2013; Artetxe et al., 2016, 2017, 2018a; Smith et al., 2017; Ruder et al., 2019), which aim to improve CLWE model performance significantly. Projection-based CLWE models learn a transfer function or mapper between two independently trained monolingual word vector spaces with limited or no cross-lingual supervision. Famous among projection-based CLWE models are the unsupervised projection-based CLWE models (Artetxe et al., 2017; Lample et al., 2018; Alvarez-Melis & Jaakkola, 2018;

machine learning, natural language, permutation matrix, (16 more...)

arXiv.org Artificial Intelligence

Dec-5-2022

arXiv.org PDF

Add feedback

Country:
- Asia > Russia (0.04)
- Oceania > Australia
  - Victoria > Melbourne (0.04)
- North America
  - United States
    - Utah (0.04)
    - Texas > Travis County
      - Austin (0.04)
    - Minnesota > Hennepin County
      - Minneapolis (0.14)
  - Canada > British Columbia
    - Metro Vancouver Regional District > Vancouver (0.04)
- Europe
  - Russia (0.04)
  - Italy > Tuscany
    - Florence (0.04)
  - Denmark > Capital Region
    - Copenhagen (0.04)
  - Belgium > Brussels-Capital Region
    - Brussels (0.04)
- Africa
  - West Africa (0.04)
  - Ghana (0.04)

Genre:
- Research Report (0.50)
- Overview (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Machine Translation (0.88)
  - Machine Learning > Statistical Learning (0.71)
  - Representation & Reasoning > Optimization (0.68)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found