Transformer to CNN: Label-scarce distillation for efficient text classification

Chia, Yew Ken, Witteveen, Sam, Andrews, Martin

Sep-8-2019–arXiv.org Machine Learning

Significant advances have been made in Natural Language Proc essing (NLP) modelling since the beginning of 2018. The new approaches allow for accurate results, even when there is little labelled data, because these NLP mo dels can benefit from training on both task-agnostic and task-specific unlabelle d data. However, these advantages come with significant size and computational cos ts. This workshop paper outlines how our proposed convolutiona l student architecture, having been trained by a distillation process from a la rge-scale model, can achieve 300 inference speedup and 39 reduction in parameter count. In some cases, the student model performance surpasses its teacher on the studied tasks.

classification, machine learning, natural language, (16 more...)

arXiv.org Machine Learning

Sep-8-2019

arXiv.org PDF

Add feedback

Country:
- North America > Canada (0.14)

Genre:
- Research Report (0.83)

Industry:
- Education (0.35)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Text Classification (0.69)
  - Machine Learning > Neural Networks
    - Deep Learning (0.52)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found