Reviews: LCA: Loss Change Allocation for Neural Network Training

Neural Information Processing Systems 

Originality: While lots of works have studied the property of the endpoint found by SGDs, the literature looking at the SGD training dynamics in the context of deep neural networks is sparser, and the loss contribution metric appears novel to me. The paper is therefore original from that aspect. Quality: The paper is in general of good quality. However, few specific points could be improved: - It would be nice to characterize the approximation errors introduced by the first order taylor expension - Authors claim that the Loss contribution is grounded while other Fisher information-based metrics heavily depends on the parametrization chosen. Could the authors expend on this point and provided a more detailed comparison between LC and the metrics introduced in [1] and [13] - In the introduction, authors claim that entire layers drift on the wrong direction during training.