Appendix A Gradient Descent and Neural Tangent Kernel Gradient Descent Since we consider the square loss and `
–Neural Information Processing Systems
We provide here a brief overview of reproducing kernel Hilbert space (RKHS). More details can be found in Appendix G.2. In this work, we impose the following assumptions. Remark 5. Assumption D.3 can be replaced by an alternative assumption, that is, Assumption D.1 is related to the neural network and GD training, where similar settings have been Assumption D.2 imposes conditions on the underlying true conditional probability in the non-separable case. This assumption basically requires that the conditional probability is within the function class generated by the GD-trained neural networks we consider (thus can be calibrated).
Neural Information Processing Systems
Aug-15-2025, 13:28:30 GMT
- Technology: