Ukrainian Texts Classification: Exploration of Cross-lingual Knowledge Transfer Approaches
Dementieva, Daryna, Khylenko, Valeriia, Groh, Georg
–arXiv.org Artificial Intelligence
Despite the extensive amount of labeled datasets in the NLP text classification field, the persistent imbalance in data availability across various languages remains evident. Ukrainian, in particular, stands as a language that still can benefit from the continued refinement of cross-lingual methodologies. Due to our knowledge, there is a tremendous lack of Ukrainian corpora for typical text classification tasks. In this work, we leverage the state-of-the-art advances in NLP, exploring cross-lingual knowledge transfer methods avoiding manual data curation: large multilingual encoders and translation systems, LLMs, and language adapters. We test the approaches on three text classification tasks -- toxicity classification, formality classification, and natural language inference -- providing the "recipe" for the optimal setups.
arXiv.org Artificial Intelligence
Apr-2-2024
- Country:
- North America
- United States
- Hawaii (0.04)
- California (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- Canada > Ontario
- Toronto (0.04)
- United States
- Europe
- Ukraine (0.04)
- Romania > Sud - Muntenia Development Region
- Giurgiu County > Giurgiu (0.04)
- Portugal > Lisbon
- Lisbon (0.04)
- Middle East > Republic of Türkiye
- Istanbul Province > Istanbul (0.04)
- Germany > Bavaria
- Upper Bavaria > Munich (0.04)
- Finland > Uusimaa
- Helsinki (0.04)
- Croatia > Dubrovnik-Neretva County
- Dubrovnik (0.04)
- Asia
- Singapore (0.04)
- China > Hong Kong (0.04)
- Middle East > Republic of Türkiye
- Istanbul Province > Istanbul (0.04)
- North America
- Genre:
- Research Report (0.40)
- Technology: