K For The Price Of 1: Parameter Efficient Multi-task And Transfer Learning

Mudrakarta, Pramod Kaushik, Sandler, Mark, Zhmoginov, Andrey, Howard, Andrew

Oct-24-2018–arXiv.org Machine Learning

We introduce a novel method that enables parameter-efficient transfer and multitask learning. The basic approach is to allow a model patch - a small set of parameters - to specialize to each task, instead of fine-tuning the last layer or the entire network. For instance, we show that learning a set of scales and biases allows a network to learn a completely different embedding that could be used for different tasks (such as converting an SSD detection model into a 1000-class classification model while reusing 98% of parameters of the feature extractor). Similarly, we show that re-learning the existing low-parameter layers (such as depth-wise convolutions) also improves accuracy significantly. Our approach allows both simultaneous (multi-task) learning as well as sequential transfer learning wherein we adapt pretrained networks to solve new problems. For multi-task learning, despite using much fewer parameters than traditional logits-only fine-tuning, we match single-task-based performance.

artificial intelligence, machine learning, model patch, (18 more...)

arXiv.org Machine Learning

Oct-24-2018

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.68)

Genre:
- Research Report
  - New Finding (0.46)
  - Promising Solution (0.34)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Transfer Learning (0.93)
  - Neural Networks > Deep Learning (0.47)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found