Table 1 Starting from one auxiliary task Exemplar MT we keep

Neural Information Processing Systems 

We would like to thank all the reviewers for writing the insightful comments, especially during this difficult time. However, lowering training loss may cause overfitting, especially when training data is scarce. The superiority of ARML is verified in the experiments. The error rates decreases when each new task is added. In'Baseline + ARML ', for fair comparison, we stick to the same training process, We will add more elaboration on this in the final version. We will try other tasks, e.g., reinforcement learning.