Grad2Task: Improved Few-shot Text Classification Using Gradients for Task Representation
Wang, Jixuan, Wang, Kuan-Chieh, Rudzicz, Frank, Brudno, Michael
–arXiv.org Artificial Intelligence
Large pretrained language models (LMs) like BERT have improved performance in many disparate natural language processing (NLP) tasks. However, fine tuning such models requires a large number of training examples for each target task. Simultaneously, many realistic NLP problems are "few shot", without a sufficiently large training set. In this work, we propose a novel conditional neural process-based approach for few-shot text classification that learns to transfer from other diverse tasks with rich annotation. Our key idea is to represent each task using gradient information from a base model and to train an adaptation network that modulates a text classifier conditioned on the task representation. While previous task-aware few-shot learners represent tasks by input encoding, our novel task representation is more powerful, as the gradient captures input-output relationships of a task. Experimental results show that our approach outperforms traditional fine-tuning, sequential transfer learning, and state-of-the-art meta learning approaches on a collection of diverse few-shot tasks. We further conducted analysis and ablations to justify our design choices.
arXiv.org Artificial Intelligence
Jan-27-2022
- Country:
- North America
- United States
- Washington > King County
- Seattle (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- Washington > King County
- Canada > Ontario
- Toronto (0.14)
- United States
- Europe
- Czechia > Prague (0.04)
- Romania > Sud - Muntenia Development Region
- Giurgiu County > Giurgiu (0.04)
- Portugal > Lisbon
- Lisbon (0.04)
- North America
- Genre:
- Research Report > New Finding (0.34)
- Industry:
- Media (0.46)
- Technology: