On Quadratic Penalties in Elastic Weight Consolidation
There are situations in which we would like to train a neural network to perform a range of tasks. This is usually possible if we can train the network on all tasks simultane ously. The problem is harder if we would like to train the network sequentially, one task after anot her. The na ive approach of training a trained neural network on a new task via gradient descen t leads to a phenomenon known as catastrophic forgetting: the network's performance in previous ly learned tasks rapidly deteriorates as soon as we start training on a new task. Kirkpatrick et al. [2017] propose a novel algorithm, elastic weight co nslidation (EWC), to address this problem, while maintaining the simplicity of relying on backpropagat ion and stochastic gradient descent as the main algorithmic workhorses. The authors observe that catastrophic forgetting would not happen if the network's parameters were learnt in a Bayesian fa shion: instead of obtaining single estimate of parameters θ via gradient descent, we calculate the Bayesian posterior distribut ion p( θ D
Dec-11-2017