AT Proofs
–Neural Information Processing Systems
A.1 Proof of Proposition 1 Proof of Proposition 1. Recall that h denotes the vanilla activations of the network, those we obtain with no noise injection. Let us not inject noise in the final, predictive, layer of our network such that the noise on this layer is accumulated from the noising of previous layers. Let us first consider the Taylor series expansion of the loss function with the accumulated noise defined in Proposition 1. Denoting =[ This can be deduced from the slightly opaque Fa ` a di Bruno's formula, which states that for multivariate derivatives of a composition of functions f: R The final equality comes from the moments of a mean 0 Gaussian, where j takes the values of the multi-index. Though these equalities can already offer insight into the regularising mechanisms of GNIs, they are not easy to work with and will often be computationally intractable. We will include these terms in our remainder term C .
Neural Information Processing Systems
Nov-15-2025, 06:17:28 GMT
- Technology: