Goto

Collaborating Authors

 pre-training and knowledge


Finding the most relevant auxiliary forecasting tasks for pre-training and knowledge transferring to a given primary

Neural Information Processing Systems

We thank the reviewers for valuable and timely comments. We'd like to first emphasize the challenges and contributions: Section 3.2 explains how to calculate this hyper-gradient of Framework for BackPropagation, LeCun, 1988), and widely adopted in the literature [14, 15, 35]. We would like to further polish the notation to be more consistent. 'Pretrain (Top)' is much better than'Pretrain (Down)'.