Review for NeurIPS paper: Mutual exclusivity as a challenge for deep neural networks

Neural Information Processing Systems 

The experiments assume that the models exhibit the bias if the probability of the new class(es) given a new word is one. It is not clear why it is expected for the models to assign the probability of one to the correct (new) class. When testing classification models, the correct class is the one that the model assigns the highest probability to, and this probability is often much smaller than one. This is because of the fact the sum of the prior probability over all incorrect classes is relatively large when there are many classes, even though the probability of individual classes is small. Moreover, when testing the models in a continual learning setup, the authors should continue training before the model overfits, and report the performance of the models on a held-out split.