Appendices A Gradient terms for the adaptation scheme
–Neural Information Processing Systems
A.1 Gradients for the entropy approximation Following the arguments in [13], we can compute the gradient of the term in (13) using θ A.2 Gradients for the penalty function We used the following penalty function h( x) = ( x δ) A.3 Gradients for the energy error We can write the energy error as (q We generalise the arguments from [14], Lemma 7. Proceeding by induction over n, we have for the case n = 1, for any v R The suggested approach can perform poorly for non-convex potentials or even convex potentials such as arsing in a logistic regression model for some data sets. We illustrate here how to learn a reasonable proposal for a general potential function by considering some version of position-dependent preconditioning. The transformation f as well as U generally depend on some parameters θ that we again omit for a less convoluted notation. Our approach can be seen as an alternative for instance to [31] where such a transformation is first learned by trying to approximate π with a standard Gaussian density using variational inference, while the HMC hyperparameters are adapted in a second step using Bayesian optimisation. The motivation for stopping the gradients comes from considering the special case f: z null Cz that corresponds to the position-independent preconditioning scheme above.
Neural Information Processing Systems
Aug-18-2025, 17:29:49 GMT
- Technology: