A Appendix

Neural Information Processing Systems 

In the appendix, we have the following results. In Appendix A.1, we summarize the main notations used in this paper. In Appendix A.2 - A.9, we show all the proofs of our theoretical results. In Appendix A.10, we present the overall training procedures (e.g., pseudo code) of our Eq (5) at initialization is iid centered Gaussian process, i.e., f () N 0, K Using the definition of the distribution kernel in Eq. (6), we have NNGP kernel is a special case of NTK when training only the output layer. The objective function of Eq. (7) can be rewritten as follows.