Cross-Lingual Constituency Parsing for Middle High German: A Delexicalized Approach
Nie, Ercong, Schmid, Helmut, Schütze, Hinrich
–arXiv.org Artificial Intelligence
Constituency parsing plays a fundamental role in advancing natural language processing (NLP) tasks. However, training an automatic syntactic analysis system for ancient languages solely relying on annotated parse data is a formidable task due to the inherent challenges in building treebanks for such languages. It demands extensive linguistic expertise, leading to a scarcity of available resources. To overcome this hurdle, cross-lingual transfer techniques which require minimal or even no annotated data for low-resource target languages offer a promising solution. In this study, we focus on building a constituency parser for $\mathbf{M}$iddle $\mathbf{H}$igh $\mathbf{G}$erman ($\mathbf{MHG}$) under realistic conditions, where no annotated MHG treebank is available for training. In our approach, we leverage the linguistic continuity and structural similarity between MHG and $\mathbf{M}$odern $\mathbf{G}$erman ($\mathbf{MG}$), along with the abundance of MG treebank resources. Specifically, by employing the $\mathit{delexicalization}$ method, we train a constituency parser on MG parse datasets and perform cross-lingual transfer to MHG parsing. Our delexicalized constituency parser demonstrates remarkable performance on the MHG test set, achieving an F1-score of 67.3%. It outperforms the best zero-shot cross-lingual baseline by a margin of 28.6% points. These encouraging results underscore the practicality and potential for automatic syntactic analysis in other ancient languages that face similar challenges as MHG.
arXiv.org Artificial Intelligence
Aug-29-2023
- Country:
- Oceania > Australia
- North America
- United States
- District of Columbia > Washington (0.04)
- Texas > Travis County
- Austin (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- Canada
- Ontario > Toronto (0.04)
- British Columbia > Metro Vancouver Regional District
- Vancouver (0.04)
- United States
- Europe
- Ireland (0.04)
- United Kingdom
- Scotland > City of Edinburgh
- Edinburgh (0.04)
- England > Oxfordshire
- Oxford (0.04)
- Scotland > City of Edinburgh
- Spain
- Galicia > Madrid (0.04)
- Catalonia > Barcelona Province
- Barcelona (0.04)
- Italy > Tuscany
- Florence (0.04)
- Germany
- Brandenburg > Potsdam (0.04)
- Bavaria > Upper Bavaria
- Munich (0.04)
- Baden-Württemberg
- Stuttgart Region > Stuttgart (0.04)
- Tübingen Region > Tübingen (0.04)
- France > Provence-Alpes-Côte d'Azur
- Bouches-du-Rhône > Marseille (0.04)
- Asia
- Indonesia > Bali (0.04)
- Japan > Kyūshū & Okinawa
- Kyūshū > Miyazaki Prefecture > Miyazaki (0.04)
- China
- Hong Kong (0.04)
- Henan Province > Zhengzhou (0.04)
- Genre:
- Research Report > New Finding (1.00)
- Technology: