Inducing Transformer's Compositional Generalization Ability via Auxiliary Sequence Prediction Tasks

Open in new window