BERT and PALs: Projected Attention Layers for Efficient Adaptation in Multi-Task Learning

Open in new window