El Volumen Louder Por Favor: Code-switching in Task-oriented Semantic Parsing
Einolghozati, Arash, Arora, Abhinav, Lecanda, Lorena Sainz-Maza, Kumar, Anuj, Gupta, Sonal
–arXiv.org Artificial Intelligence
Being able to parse code-switched (CS) utterances, such as Spanish+English or Hindi+English, is essential to democratize task-oriented semantic parsing systems for certain locales. In this work, we focus on Spanglish (Spanish+English) and release a dataset, CSTOP, containing 5800 CS utterances alongside their semantic parses. We examine the CS generalizability of various Cross-lingual (XL) models and exhibit the advantage of pre-trained XL language models when data for only one language is present. As such, we focus on improving the pre-trained models for the case when only English corpus alongside either zero or a few CS training instances are available. We propose two data augmentation methods for the zero-shot and the few-shot settings: fine-tune using translate-and-align and augment using a generation model followed by match-and-filter. Combining the few-shot setting with the above improvements decreases the initial 30-point accuracy gap between the zero-shot and the full-data settings by two thirds.
arXiv.org Artificial Intelligence
Jan-28-2021
- Country:
- Oceania > Australia
- North America
- United States
- Texas > Travis County
- Austin (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.28)
- Michigan > Washtenaw County
- Ann Arbor (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- Colorado > Denver County
- Denver (0.04)
- California
- San Francisco County > San Francisco (0.14)
- San Diego County > San Diego (0.04)
- Texas > Travis County
- Canada > British Columbia
- United States
- Europe
- Italy (0.04)
- Germany > Berlin (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- Spain > Valencian Community
- Valencia Province > Valencia (0.04)
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- Austria > Styria
- Graz (0.04)
- Asia
- China > Hong Kong (0.05)
- Indonesia > Bali (0.04)
- Taiwan > Taiwan Province
- Taipei (0.04)
- India > West Bengal
- Kolkata (0.04)
- Africa > Ethiopia
- Addis Ababa > Addis Ababa (0.04)
- Genre:
- Research Report (0.82)
- Technology: