LINGUIST: Language Model Instruction Tuning to Generate Annotated Utterances for Intent Classification and Slot Tagging
Rosenbaum, Andy, Soltan, Saleh, Hamza, Wael, Versley, Yannick, Boese, Markus
–arXiv.org Artificial Intelligence
We present LINGUIST, a method for generating annotated data for Intent Classification and Slot Tagging (IC+ST), via fine-tuning AlexaTM 5B, a 5-billion-parameter multilingual sequence-to-sequence (seq2seq) model, on a flexible instruction prompt. In a 10-shot novel intent setting for the SNIPS dataset, LINGUIST surpasses state-of-the-art approaches (Back-Translation and Example Extrapolation) by a wide margin, showing absolute improvement for the target intents of +1.9 points on IC Recall and +2.5 points on ST F1 Score. In the zero-shot cross-lingual setting of the mATIS++ dataset, LINGUIST out-performs a strong baseline of Machine Translation with Slot Alignment by +4.14 points absolute on ST F1 Score across 6 languages, while matching performance on IC. Finally, we verify our results on an internal large-scale multilingual dataset for conversational agent IC+ST and show significant improvements over a baseline which uses Back-Translation, Paraphrasing and Slot Catalog Resampling. To our knowledge, we are the first to demonstrate instruction fine-tuning of a large-scale seq2seq model to control the outputs of multilingual intent- and slot-labeled data generation.
arXiv.org Artificial Intelligence
Sep-20-2022
- Country:
- North America
- United States
- West Virginia (0.04)
- North Dakota (0.04)
- New Jersey (0.04)
- Michigan > Washtenaw County
- Ann Arbor (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- Massachusetts > Suffolk County
- Boston (0.04)
- Illinois > Cook County
- Chicago (0.04)
- Washington > King County
- Seattle (0.04)
- California
- San Francisco County > San Francisco (0.04)
- San Diego County > San Diego (0.04)
- New York > New York County
- New York City (0.04)
- Canada > Ontario
- Toronto (0.04)
- United States
- Europe
- Spain > Catalonia
- Barcelona Province > Barcelona (0.04)
- Germany
- Berlin (0.04)
- North Rhine-Westphalia > Cologne Region
- Aachen (0.04)
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- Spain > Catalonia
- Asia
- North America
- Genre:
- Research Report > Promising Solution (0.47)
- Industry:
- Technology: