Reviews: Episodic Memory in Lifelong Language Learning

Neural Information Processing Systems 

The paper addresses the very important topic of lifelong learning, and it proposes to employ an episodic memory to avoid catastrophic forgetting. The memory is based on a key-value representation that exploits an encoder-decoder architecture based on BERT. The training is made on the concatenation of different datasets, of which there is no need to specify the identifiers. The work is highly significant and the novelty of the contribution is remarkable. One point that would have deserved more attention is the strategies for the reading and writing of the episodic memory (see also comments below).