SupplementaryMaterial: Appendix BayesianDeepEnsemblesviatheNeuralTangentKernel ARecapofstandardandNTKparameterisations

Neural Information Processing Systems 

We see that the different parameterisations yield the same distribution for the functional output f(,θ)atinitialisation, butgivedifferent scalings tothe parameter gradients inthe backward pass. GP(0,Θ L) and is independent off0() in the infinite width limit. Let X0 be an arbitrary test set. In fact, even with a heteroscedastic priorθ N(0,Λ) with a diagonal matrix Λ Rp p+ and diagonal entries {λj}pj=1, it is straightforward to show that the correct setting of regularisation iskθk2Λ = θ>Λ 1θ in order to obtain a posterior sample of θ. For an NN in the linearised regime [23], this is related to the fact that the NTK and standard parameterisations initialise parameters differently, yet yield the same functional distribution for a randomly initialised NN.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found