Appendix A Gradient Descent and Neural Tangent Kernel Gradient Descent Since we consider the square loss and `

Neural Information Processing Systems 

We provide here a brief overview of reproducing kernel Hilbert space (RKHS). More details can be found in Appendix G.2. In this work, we impose the following assumptions. Remark 5. Assumption D.3 can be replaced by an alternative assumption, that is, Assumption D.1 is related to the neural network and GD training, where similar settings have been Assumption D.2 imposes conditions on the underlying true conditional probability in the non-separable case. This assumption basically requires that the conditional probability is within the function class generated by the GD-trained neural networks we consider (thus can be calibrated).

Similar Docs  Excel Report  more

TitleSimilaritySource
None found