Inducing Transformer's Compositional Generalization Ability via Auxiliary Sequence Prediction Tasks