Grad2Task: Improved Few-shot Text Classification Using Gradients for Task Representation

Wang, Jixuan, Wang, Kuan-Chieh, Rudzicz, Frank, Brudno, Michael

Jan-27-2022–arXiv.org Artificial Intelligence

Large pretrained language models (LMs) like BERT have improved performance in many disparate natural language processing (NLP) tasks. However, fine tuning such models requires a large number of training examples for each target task. Simultaneously, many realistic NLP problems are "few shot", without a sufficiently large training set. In this work, we propose a novel conditional neural process-based approach for few-shot text classification that learns to transfer from other diverse tasks with rich annotation. Our key idea is to represent each task using gradient information from a base model and to train an adaptation network that modulates a text classifier conditioned on the task representation. While previous task-aware few-shot learners represent tasks by input encoding, our novel task representation is more powerful, as the gradient captures input-output relationships of a task. Experimental results show that our approach outperforms traditional fine-tuning, sequential transfer learning, and state-of-the-art meta learning approaches on a collection of diverse few-shot tasks. We further conducted analysis and ablations to justify our design choices.

base model, dataset, representation, (12 more...)

arXiv.org Artificial Intelligence

Jan-27-2022

arXiv.org PDF

Add feedback

Country:
- North America
  - United States
    - Washington > King County
      - Seattle (0.04)
    - Louisiana > Orleans Parish
      - New Orleans (0.04)
  - Canada > Ontario
    - Toronto (0.14)
- Europe
  - Czechia > Prague (0.04)
  - Romania > Sud - Muntenia Development Region
    - Giurgiu County > Giurgiu (0.04)
  - Portugal > Lisbon
    - Lisbon (0.04)

Genre:
- Research Report > New Finding (0.34)

Industry:
- Media (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Text Classification (0.86)
  - Machine Learning
    - Neural Networks (0.93)
    - Statistical Learning (0.67)