Data Augmentation for Improving Tail-traffic Robustness in Skill-routing for Dialogue Systems

Wu, Ting-Wei, Sheikholeslami, Fatemeh, Kachuee, Mohammad, Do, Jaeyoung, Lee, Sungjin

Jun-7-2023–arXiv.org Artificial Intelligence

Large-scale conversational systems typically rely on a skill-routing component to route a user request to an appropriate skill and interpretation to serve the request. In such system, the agent is responsible for serving thousands of skills and interpretations which create a long-tail distribution due to the natural frequency of requests. For example, the samples related to play music might be a thousand times more frequent than those asking for theatre show times. Moreover, inputs used for ML-based skill routing are often a heterogeneous mix of strings, embedding vectors, categorical and scalar features which makes employing augmentation-based long-tail learning approaches challenging. To improve the skill-routing robustness, we propose an augmentation of heterogeneous skill-routing data and training targeted for robust operation in long-tail data regimes. We explore a variety of conditional encoder-decoder generative frameworks to perturb original data fields and create synthetic training data. To demonstrate the effectiveness of the proposed method, we conduct extensive experiments using real-world data from a commercial conversational system. Based on the experiment results, the proposed approach improves more than 80% (51 out of 63) of intents with less than 10K of traffic instances in the skill-routing replication task.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

Jun-7-2023

arXiv.org PDF

Add feedback

Country:
- North America
  - Dominican Republic (0.04)
  - Canada (0.04)
  - United States
    - New York (0.04)
    - Kansas (0.04)
    - Washington > King County
      - Seattle (0.14)
    - Minnesota > Hennepin County
      - Minneapolis (0.14)
    - California > Sacramento County
      - Sacramento (0.04)
    - Arizona > Maricopa County
      - Phoenix (0.04)
- Asia > China
  - Hong Kong (0.04)

Genre:
- Research Report (0.64)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Personal Assistant Systems (1.00)
  - Natural Language > Chatbot (1.00)
  - Machine Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found