Connecting Ideas in 'Lower-Resource' Scenarios: NLP for National Varieties, Creoles and Other Low-resource Scenarios
Joshi, Aditya, Kanojia, Diptesh, Lent, Heather, Kaing, Hour, Song, Haiyue
–arXiv.org Artificial Intelligence
While each of the lower-resource scenarios bears its unique socio-historical contexts, the tutorial (Selected as a tutorial at COLING 2025) brings together researchers working separately in Despite excellent results on benchmarks these scenarios. Collectively, the tutorial will connect over a small subset of languages, large language past research in terms of: models struggle to process text from Challenges in data curation languages situated in'lower-resource' scenarios Potential for wide linguistic variation (e.g., existing such as dialects/sociolects (national on a linguistic continuum or eschewing or social varieties of a language), Creoles strict spelling conventions, etc.) (languages arising from linguistic contact Need for smart modeling choices over greedy between multiple languages) and other lowresource ones languages. This introductory Increased model vulnerability tutorial will identify common challenges, This introductory tutorial identifies the emergence approaches, and themes in natural language of'lower-resource' scenarios, specifically national processing (NLP) research for confronting varieties, Creoles and other low-resource languages, and overcoming the obstacles inherent and highlights commonalities and differences to data poor contexts.
arXiv.org Artificial Intelligence
Sep-19-2024
- Country:
- Asia > Japan
- Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)
- Europe
- Denmark
- Capital Region > Copenhagen (0.05)
- North Jutland > Aalborg (0.05)
- France > Auvergne-Rhône-Alpes
- United Kingdom > England
- Surrey (0.04)
- Denmark
- Oceania > Australia
- New South Wales > Sydney (0.04)
- Asia > Japan
- Genre:
- Industry:
- Education > Curriculum (0.36)
- Technology: