El Volumen Louder Por Favor: Code-switching in Task-oriented Semantic Parsing

Einolghozati, Arash, Arora, Abhinav, Lecanda, Lorena Sainz-Maza, Kumar, Anuj, Gupta, Sonal

Jan-28-2021–arXiv.org Artificial Intelligence

Being able to parse code-switched (CS) utterances, such as Spanish+English or Hindi+English, is essential to democratize task-oriented semantic parsing systems for certain locales. In this work, we focus on Spanglish (Spanish+English) and release a dataset, CSTOP, containing 5800 CS utterances alongside their semantic parses. We examine the CS generalizability of various Cross-lingual (XL) models and exhibit the advantage of pre-trained XL language models when data for only one language is present. As such, we focus on improving the pre-trained models for the case when only English corpus alongside either zero or a few CS training instances are available. We propose two data augmentation methods for the zero-shot and the few-shot settings: fine-tune using translate-and-align and augment using a generation model followed by match-and-filter. Combining the few-shot setting with the above improvements decreases the initial 30-point accuracy gap between the zero-shot and the full-data settings by two thirds.

computational linguistic, proceedings, utterance, (14 more...)

arXiv.org Artificial Intelligence

Jan-28-2021

arXiv.org PDF

Add feedback

Country:
- Oceania > Australia
  - Victoria > Melbourne (0.04)
- North America
  - United States
    - Texas > Travis County
      - Austin (0.04)
    - Minnesota > Hennepin County
      - Minneapolis (0.28)
    - Michigan > Washtenaw County
      - Ann Arbor (0.04)
    - Louisiana > Orleans Parish
      - New Orleans (0.04)
    - Colorado > Denver County
      - Denver (0.04)
    - California
      - San Francisco County > San Francisco (0.14)
      - San Diego County > San Diego (0.04)
  - Canada > British Columbia
    - Metro Vancouver Regional District > Vancouver (0.14)
- Europe
  - Italy (0.04)
  - Germany > Berlin (0.04)
  - United Kingdom > England
    - Cambridgeshire > Cambridge (0.04)
  - Spain > Valencian Community
    - Valencia Province > Valencia (0.04)
  - Belgium > Brussels-Capital Region
    - Brussels (0.04)
  - Austria > Styria
    - Graz (0.04)
- Asia
  - China > Hong Kong (0.05)
  - Indonesia > Bali (0.04)
  - Taiwan > Taiwan Province
    - Taipei (0.04)
  - India > West Bengal
    - Kolkata (0.04)
- Africa > Ethiopia
  - Addis Ababa > Addis Ababa (0.04)

Genre:
- Research Report (0.82)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Grammars & Parsing (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.30)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found