Gradient Descent
Appendix of " Complex-valued Neurons Can Learn More but Slower than Real-valued Neurons via Gradient Descent " A Preliminaries
In this section, we first summarize frequently used notations in the following table. Table 4: Frequently used notations.Notation Description C Lemma 7. Let d = 1 . Combining the cases above completes the proof. Subsection B.2 proves several convergence rate lemmas. Subsection B.3 gives some technical We are now ready to prove Theorem 1. Proof of Theorem 1.
A Proofs of the Main Results
This section describes Stein variational gradient descent (SVGD) by Liu and Wang [19]. The overview is meant as supplementary material for Section 5, where we propose to use SVGD for inferring the DiBS posteriors p(Z | D) and p(Z, Θ | D). In contrast to sampling-based MCMC or optimizationbased variational inference methods, SVGD iteratively transports a fixed set of particles to closely match a target distribution, akin to the gradient descent algorithm in optimization. We refer the reader to Liu and Wang [19] for additional details. Let p(x) with x X be a differentiable density that we want to sample from, e.g., to estimate an expectation.