Review for NeurIPS paper: Improved Schemes for Episodic Memory-based Lifelong Learning

Neural Information Processing Systems 

There has been a plethora of recent and historical work on this topic, finding different ways to help networks alleviate the issue of catastrophic forgetting --- where a network trained on tasks A_0 through A_i, forgets these to differing degrees when trained on tasks A_i 1 onward. Most methods can be divided into regularisation based, memory based or meta-learning based. One relatively recent work is GEM (gradient of episodic memory) (and relatedly A-GEM). This works by storing examples from seen tasks in an episodic memory. When learning a new task, the gradient update is modified such that it does not increase the loss on examples from previous tasks (these are represented by the examples in memory).