Goto

Collaborating Authors

 src










A Appendix

Neural Information Processing Systems

In the appendix, we have the following results. In Appendix A.1, we summarize the main notations used in this paper. In Appendix A.2 - A.9, we show all the proofs of our theoretical results. In Appendix A.10, we present the overall training procedures (e.g., pseudo code) of our proposed DINO-INIT and DINO-TRAIN algorithms, as well as the limitations of our work. Assume that all the parameters of f() follows standard normal distribution, in the limits as the layer width d!1, the output function of the distribution-informed neural network f(x) in Eq (5) at initialization is iid centered Gaussian process, i.e., f() N 0, K Using the definition of the distribution kernel in Eq. (6), we have K It is shown [4] that the key difference between NNGP kernel and NTK is that NTK is generated by a fully-trained neural network, whereas NNGP kernel is produced by a weakly-trained neural network.