Extreme Model Compression for On-device Natural Language Understanding

Sathyendra, Kanthashree Mysore, Choudhary, Samridhi, Nicolich-Henkin, Leah

Nov-30-2020–arXiv.org Artificial Intelligence

In this paper, we propose and experiment with techniques for extreme compression of neural natural language understanding (NLU) models, making them suitable for execution on resource-constrained devices. We propose a task-aware, end-to-end compression approach that performs word-embedding compression jointly with NLU task learning. We show our results on a large-scale, commercial NLU system trained on a varied set of intents with huge vocabulary sizes. Our approach outperforms a range of baselines and achieves a compression rate of 97.4% with less than 3.7% degradation in predictive performance. Our analysis indicates that the signal from the downstream task is important for effective compression with minimal degradation in performance.

compression, neural network, utterance, (14 more...)

arXiv.org Artificial Intelligence

Nov-30-2020

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.70)

Industry:
- Information Technology (0.68)

Technology:
- Information Technology > Artificial Intelligence
  - Speech > Speech Recognition (1.00)
  - Natural Language > Chatbot (0.93)
  - Representation & Reasoning > Personal Assistant Systems (0.93)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found