Goto

Collaborating Authors

 Gradient Descent



Appendix of " Complex-valued Neurons Can Learn More but Slower than Real-valued Neurons via Gradient Descent " A Preliminaries

Neural Information Processing Systems

In this section, we first summarize frequently used notations in the following table. Table 4: Frequently used notations.Notation Description C Lemma 7. Let d = 1 . Combining the cases above completes the proof. Subsection B.2 proves several convergence rate lemmas. Subsection B.3 gives some technical We are now ready to prove Theorem 1. Proof of Theorem 1.



A Guide Through the Zoo of Biased SGD

Neural Information Processing Systems

We also provide examples where biased estimators outperform their unbiased counterparts or where unbiased versions are simply not available. Finally, we demonstrate the effectiveness of our framework through experimental results that validate our theoretical findings.


LocalSignalAdaptivity: ProvableFeatureLearning inNeuralNetworksBeyondKernels

Neural Information Processing Systems

Specifically,we prove that, forasimple data distribution with sparsesignal amidst high-variance noise, a simple convolutional neural network trained using stochastic gradient descent simultaneously learnstothreshold outthenoiseandfindthesignal.





Acontrastiveruleformeta-learning

Neural Information Processing Systems

Our rule may be understood as ageneralization of contrastive Hebbian learning to meta-learning and notably, it neither requires computing second derivativesnorgoing backwardsintime,twocharacteristic features of previous gradient-based methods that are hard to conceive in physicalneuralcircuits.


A Proofs of the Main Results

Neural Information Processing Systems

This section describes Stein variational gradient descent (SVGD) by Liu and Wang [19]. The overview is meant as supplementary material for Section 5, where we propose to use SVGD for inferring the DiBS posteriors p(Z | D) and p(Z, Θ | D). In contrast to sampling-based MCMC or optimizationbased variational inference methods, SVGD iteratively transports a fixed set of particles to closely match a target distribution, akin to the gradient descent algorithm in optimization. We refer the reader to Liu and Wang [19] for additional details. Let p(x) with x X be a differentiable density that we want to sample from, e.g., to estimate an expectation.