Natural Language Processing for Dialects of a Language: A Survey
Joshi, Aditya, Dabre, Raj, Kanojia, Diptesh, Li, Zhuang, Zhan, Haolan, Haffari, Gholamreza, Dippold, Doris
–arXiv.org Artificial Intelligence
State-of-the-art natural language processing (NLP) models are trained on massive training corpora, and report a superlative performance on evaluation datasets. This survey delves into an important attribute of these datasets: the dialect of a language. Motivated by the performance degradation of NLP models for dialectic datasets and its implications for the equity of language technologies, we survey past research in NLP for dialects in terms of datasets, and approaches. We describe a wide range of NLP tasks in terms of two categories: natural language understanding (NLU) (for tasks such as dialect classification, sentiment analysis, parsing, and NLU benchmarks) and natural language generation (NLG) (for summarisation, machine translation, and dialogue systems). The survey is also broad in its coverage of languages which include English, Arabic, German among others. We observe that past work in NLP concerning dialects goes deeper than mere dialect classification, and . This includes early approaches that used sentence transduction that lead to the recent approaches that integrate hypernetworks into LoRA. We expect that this survey will be useful to NLP researchers interested in building equitable language technologies by rethinking LLM benchmarks and model architectures.
arXiv.org Artificial Intelligence
Jan-10-2024
- Country:
- South America
- Paraguay > Asunción
- Asunción (0.04)
- Colombia > Meta Department
- Villavicencio (0.04)
- Paraguay > Asunción
- Oceania
- New Zealand (0.04)
- Australia
- Victoria > Melbourne (0.04)
- South Australia > Adelaide (0.04)
- New South Wales > Sydney (0.04)
- North America
- United States
- Maryland > Baltimore (0.04)
- Texas > Travis County
- Austin (0.04)
- Ohio > Franklin County
- Columbus (0.04)
- Michigan > Washtenaw County
- Ann Arbor (0.04)
- Hawaii > Honolulu County
- Honolulu (0.04)
- Canada
- Ontario > Toronto (0.04)
- Quebec > Montreal (0.04)
- British Columbia > Metro Vancouver Regional District
- Vancouver (0.04)
- United States
- Europe
- Italy (0.04)
- Germany (0.04)
- Finland (0.04)
- Slovenia (0.04)
- Russia (0.04)
- Middle East (0.04)
- Bulgaria
- Varna Province > Varna (0.04)
- Sofia City Province > Sofia (0.04)
- Iceland > Capital Region
- Reykjavik (0.04)
- Spain
- Valencian Community > Valencia Province
- Valencia (0.04)
- Catalonia > Barcelona Province
- Barcelona (0.04)
- Valencian Community > Valencia Province
- United Kingdom > England
- Surrey > Guildford (0.04)
- Cambridgeshire > Cambridge (0.04)
- France
- Île-de-France > Paris
- Paris (0.04)
- Provence-Alpes-Côte d'Azur > Bouches-du-Rhône
- Marseille (0.04)
- Île-de-France > Paris
- Sweden > Östergötland County
- Linköping (0.04)
- Faroe Islands > Streymoy
- Tórshavn (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Estonia > Tartu County
- Tartu (0.04)
- Asia
- Singapore (0.04)
- Indonesia > Bali (0.04)
- Taiwan (0.04)
- Russia (0.04)
- India > Maharashtra (0.04)
- Philippines > Luzon
- National Capital Region > City of Manila (0.14)
- South Korea > Seoul
- Seoul (0.04)
- China
- Middle East
- Jordan (0.04)
- Lebanon (0.04)
- Kuwait (0.04)
- Iraq > Kurdistan Region (0.04)
- Yemen > Amanat Al Asimah
- Sanaa (0.04)
- UAE > Abu Dhabi Emirate
- Abu Dhabi (0.04)
- Qatar > Ad-Dawhah
- Doha (0.04)
- Japan
- Kyūshū & Okinawa
- Okinawa (0.04)
- Kyūshū
- Miyazaki Prefecture > Miyazaki (0.04)
- Kumamoto Prefecture > Kumamoto (0.04)
- Honshū
- Tōhoku > Aomori Prefecture
- Aomori (0.04)
- Kantō > Tokyo Metropolis Prefecture
- Tokyo (0.14)
- Kansai
- Kyoto Prefecture > Kyoto (0.04)
- Osaka Prefecture > Osaka (0.04)
- Tōhoku > Aomori Prefecture
- Kyūshū & Okinawa
- Africa > Middle East
- Morocco (0.04)
- South America
- Genre:
- Research Report (1.00)
- Overview (1.00)
- Industry:
- Health & Medicine (0.92)
- Education (0.67)
- Technology:
- Information Technology > Artificial Intelligence
- Representation & Reasoning > Rule-Based Reasoning (0.93)
- Natural Language
- Machine Translation (1.00)
- Large Language Model (1.00)
- Grammars & Parsing (1.00)
- Chatbot (0.93)
- Discourse & Dialogue (0.89)
- Machine Learning
- Statistical Learning (1.00)
- Neural Networks > Deep Learning (0.93)
- Information Technology > Artificial Intelligence