Benchmarking zero-shot and few-shot approaches for tokenization, tagging, and dependency parsing of Tagalog text
Aquino, Angelina, de Leon, Franz
–arXiv.org Artificial Intelligence
The grammatical analysis of texts in any written language typically involves a number of basic processing tasks, such as tokenization, morphological tagging, and dependency parsing. State-of-the-art systems can achieve high accuracy on these tasks for languages with large datasets, but yield poor results for languages which have little to no annotated data. To address this issue for the Tagalog language, we investigate the use of alternative language resources for creating task-specific models in the absence of dependency-annotated Tagalog data. We also explore the use of word embeddings and data augmentation to improve performance when only a small amount of annotated Tagalog data is available. We show that these zero-shot and few-shot approaches yield substantial improvements on grammatical analysis of both in-domain and out-of-domain Tagalog text compared to state-of-the-art supervised baselines.
arXiv.org Artificial Intelligence
Jan-5-2023
- Country:
- Oceania > Australia
- New South Wales > Sydney (0.04)
- North America > United States
- Texas > Dallas County
- Dallas (0.04)
- New York > New York County
- New York City (0.04)
- Texas > Dallas County
- Europe
- Spain (0.04)
- Slovenia (0.04)
- Czechia > Prague (0.04)
- Italy > Tuscany
- Florence (0.04)
- Germany
- Berlin (0.04)
- Baden-Württemberg > Tübingen Region
- Tübingen (0.04)
- Bulgaria > Sofia City Province
- Sofia (0.04)
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- Asia
- Singapore (0.04)
- China > Hong Kong (0.04)
- South Korea > Seoul
- Seoul (0.04)
- Philippines
- Visayas > Central Visayas
- Province of Cebu > City of Cebu (0.04)
- Luzon > National Capital Region
- City of Manila (0.15)
- City of Quezon (0.04)
- Visayas > Central Visayas
- Malaysia > Kuala Lumpur
- Kuala Lumpur (0.04)
- Africa > Chad
- Salamat (0.04)
- Oceania > Australia
- Genre:
- Research Report (0.50)
- Technology: