continual image captioning
RA TT: Recurrent Attention to Transient Tasks for Continual Image Captioning (SUPPLEMENTARY MATERIAL)
We exploit categorical image annotations available in many captioning datasets. The influence of the people category is clearly visible. Figure 2: RA TT ablation on the MS-COCO validation set using different attention masks. Evaluation is the same as MS-COCO (figure 4). In figures 6 and 7, we give a comparison of performance for all considered approaches on the MS-COCO validation set. These learning curves and heatmaps allow us to appreciate the ability of RA TT to remember old tasks.
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.05)
- Europe > Italy (0.05)
- North America > Canada (0.04)
Review for NeurIPS paper: RATT: Recurrent Attention to Transient Tasks for Continual Image Captioning
Strengths: The paper is one of the first to study continual learning in recurrent settings and shows promising performance on the image captioning task. It proposes RATT, a novel approach for recurrent continual learning based on attentional masking, inspired by the previous HAT method. In its proposed method, three masks (a_x, a_h, and a_s) to embedding, hidden state, and vocabulary are introduced, and in its ablation study, the paper shows that all these three components are helpful to the final continual learning performance. In addition to the proposed novel approach, the paper also explores adapting weight regularization and knowledge distillation-based approaches to the recurrent continual learning problem. In its experiments, the paper shows strong results, largely outperforming simple baselines (such as fine-tuning) and previous regularization or distillation-based approaches (EWC and LwF).
Review for NeurIPS paper: RATT: Recurrent Attention to Transient Tasks for Continual Image Captioning
The paper received two accept reviews and one borderline reject [R1]. The main concern of R1 is the paper relies on simple/not the most recent approaches for both captioning and continual learning. The other reviewers and I agree to that but believe that for one of the first papers in continual learning for captioning that this is reasonable, even if it is not optimal. R1 did not respond after the rebuttal. The reviewers appreciate the the paper's contributions, including 1) First paper in continual learning in image captioning.
RATT: Recurrent Attention to Transient Tasks for Continual Image Captioning
Research on continual learning has led to a variety of approaches to mitigating catastrophic forgetting in feed-forward classification networks. Until now surprisingly little attention has been focused on continual learning of recurrent models applied to problems like image captioning. In this paper we take a systematic look at continual learning of LSTM-based models for image captioning. We propose an attention-based approach that explicitly accommodates the transient nature of vocabularies in continual image captioning tasks -- i.e. that task vocabularies are not disjoint. We call our method Recurrent Attention to Transient Tasks (RATT), and also show how to adapt continual learning approaches based on weight regularization and knowledge distillation to recurrent continual learning problems.
RATT: Recurrent Attention to Transient Tasks for Continual Image Captioning
Research on continual learning has led to a variety of approaches to mitigating catastrophic forgetting in feed-forward classification networks. Until now surprisingly little attention has been focused on continual learning of recurrent models applied to problems like image captioning. In this paper we take a systematic look at continual learning of LSTM-based models for image captioning. We propose an attention-based approach that explicitly accommodates the transient nature of vocabularies in continual image captioning tasks -- i.e. that task vocabularies are not disjoint. We call our method Recurrent Attention to Transient Tasks (RATT), and also show how to adapt continual learning approaches based on weight regularization and knowledge distillation to recurrent continual learning problems.