Title

Author

Neural Information Processing Systems 

In this section, we formalize and substantiate the claims of Theorem 1. Theorem 1 has three parts, which we address in the following sections. First, in Section A.2, we show that the classifier makes progress during the early-learning phase: over the first T iterations, the gradient is well correlated with v and the accuracy on mislabeled examples increases. However, as noted in the main text, this early progress halts because the gradient terms corresponding to correctly labeled examples begin to disappear. We prove this rigorously in Section A.3, which shows that the overall magnitude of the gradient terms corresponding to correctly labeled examples shrinks over the first T iterations. Finally, in Section A.4, we prove the claimed asymptotic behavior: as t!1, gradient descent perfectly memorizes the noisy labels.