fashion mnist
2e9f978b222a956ba6bdf427efbd9ab3-Supplemental.pdf
B.3 Derivations of Eq. (19) Similar to derivation above, we give the gradient with respect to weight vector w RM+, which is given by wDKL = w log Z(U,w) wEU,w (log pθ(X |z))T1N + wEU,w (log pθ(U |z))Tw . The learning rate of each stochastic gradient descent step is γt t 1, where t {1,,T}denotes the iteration for optimization. We already report the t-SNE visualization of ByPE-VAE and standard VAE in Figure. Here we give more t-SNE visualization results. First, we randomly sample from ByPE-VAEs trained on different datasets, namely, MNIST, Fashion MNIST, and Celeba, as shown in Fig.7.
AThe Noisy Quadratic Setting Additional Details
In this section we extend our discussion of the noisy quadratic model (NQM). We first discuss stability in the NQM. We then provide proofs for the results in Section 4. We also extend our discussion of robust stability and the stability of models with hidden states. A.1 Stability In this subsection we expand on the short discussion of key stability results in the body of the paper. We will primarily discuss stability of the nominal system.