A Distributed Representation-Based Framework for Cross-Lingual Transfer Parsing
Guo, Jiang, Che, Wanxiang, Yarowsky, David, Wang, Haifeng, Liu, Ting
–Journal of Artificial Intelligence Research
This paper investigates the problem of cross-lingual transfer parsing, aiming at inducing dependency parsers for low-resource languages while using only training data from a resource-rich language (e.g., English). Existing model transfer approaches typically don't include lexical features, which are not transferable across languages. In this paper, we bridge the lexical feature gap by using distributed feature representations and their composition. We provide two algorithms for inducing cross-lingual distributed representations of words, which map vocabularies from two different languages into a common vector space. Consequently, both lexical features and non-lexical features can be used in our model for cross-lingual transfer. Furthermore, our framework is flexible enough to incorporate additional useful features such as cross-lingual word clusters. Our combined contributions achieve an average relative error reduction of 10.9% in labeled attachment score as compared with the delexicalized parser, trained on English universal treebank and transferred to three other languages. It also significantly outperforms state-of-the-art delexicalized models augmented with projected cluster features on identical data. Finally, we demonstrate that our models can be further boosted with minimal supervision (e.g., 100 annotated sentences) from target languages, which is of great significance for practical usage.
Journal of Artificial Intelligence Research
Apr-21-2016
- Country:
- South America > Brazil (0.04)
- North America
- United States
- Maryland > Baltimore (0.14)
- Nevada (0.04)
- New York (0.04)
- Michigan > Washtenaw County
- Ann Arbor (0.04)
- California > San Diego County
- San Diego (0.04)
- Ohio > Franklin County
- Columbus (0.04)
- Florida > Broward County
- Fort Lauderdale (0.04)
- Oregon > Multnomah County
- Portland (0.04)
- Pennsylvania
- Philadelphia County > Philadelphia (0.04)
- Allegheny County > Pittsburgh (0.04)
- Massachusetts
- Suffolk County > Boston (0.04)
- Middlesex County > Cambridge (0.04)
- Canada > Quebec
- Montreal (0.04)
- United States
- Europe
- Czechia > Prague (0.04)
- Russia (0.04)
- Denmark > Capital Region
- Copenhagen (0.04)
- Bulgaria > Sofia City Province
- Sofia (0.04)
- Portugal > Lisbon
- Lisbon (0.04)
- France
- Hauts-de-France > Nord
- Lille (0.04)
- Grand Est > Meurthe-et-Moselle
- Nancy (0.04)
- Hauts-de-France > Nord
- Italy
- Liguria > Genoa (0.04)
- Trentino-Alto Adige/Südtirol > Trentino Province
- Trento (0.04)
- Spain > Catalonia
- Barcelona Province > Barcelona (0.04)
- Finland > Uusimaa
- Helsinki (0.04)
- Sweden
- Uppsala County > Uppsala (0.04)
- Vaestra Goetaland > Gothenburg (0.04)
- Middle East > Republic of Türkiye
- Istanbul Province > Istanbul (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- United Kingdom
- Scotland > City of Edinburgh
- Edinburgh (0.04)
- England > Greater Manchester
- Manchester (0.04)
- Scotland > City of Edinburgh
- Asia
- Singapore (0.04)
- South Korea (0.04)
- Russia (0.04)
- Taiwan (0.04)
- Indonesia (0.04)
- China
- Beijing > Beijing (0.05)
- Heilongjiang Province > Harbin (0.04)
- Middle East
- Republic of Türkiye > Istanbul Province
- Istanbul (0.04)
- Qatar > Ad-Dawhah
- Doha (0.04)
- Republic of Türkiye > Istanbul Province
- India > Maharashtra
- Mumbai (0.04)
- Japan > Honshū
- Chūbu > Aichi Prefecture > Nagoya (0.04)
- Africa
- Middle East > Morocco (0.04)
- Senegal > Kolda Region
- Kolda (0.04)
- Genre:
- Research Report > New Finding (0.68)
- Technology: