Review for NeurIPS paper: RATT: Recurrent Attention to Transient Tasks for Continual Image Captioning

Neural Information Processing Systems 

Strengths: The paper is one of the first to study continual learning in recurrent settings and shows promising performance on the image captioning task. It proposes RATT, a novel approach for recurrent continual learning based on attentional masking, inspired by the previous HAT method. In its proposed method, three masks (a_x, a_h, and a_s) to embedding, hidden state, and vocabulary are introduced, and in its ablation study, the paper shows that all these three components are helpful to the final continual learning performance. In addition to the proposed novel approach, the paper also explores adapting weight regularization and knowledge distillation-based approaches to the recurrent continual learning problem. In its experiments, the paper shows strong results, largely outperforming simple baselines (such as fine-tuning) and previous regularization or distillation-based approaches (EWC and LwF).