Goto

Collaborating Authors

 nullnull null




Appendices

Neural Information Processing Systems

Appendix A provides derivations supporting Section 3 in the main paper. In this section we provide detailed derivations of the ST -DGMRF joint distribution, for both first-order transition models (Section A.1) and higher-order transition models (Section A.2). A.1 Joint distribution The LDS (see Section 2.2 and 3.1 in the main paper) defines a joint distribution over system states First, note that Eq. (1) can be written as a set of linear equations x We make use of this property in the DGMRF formulation and in the conjugate gradient method. Eq. 11 is converted into a discrete-time dynamical system by approximating ρ We consider two ST -DGMRF variants that capture different amounts of prior knowledge. DGMRF transition matrices can be parameterized accordingly. The air quality dataset is based on hourly PM2.5 measurements obtained from [ The raw PM2.5 measurements are log-transformed and standardized to zero mean and unit Ca. 50% of the nodes are masked out (purple nodes within We use a simple MLP with one hidden layer of width 16 with ReLU activations and no output non-linearity. The DGMRF parameters are not shared across time, allowing for dynamically changing spatial covariance patterns.






Appendix 446 A Proof of Proposition 1 in Section 2 447 Proof

Neural Information Processing Systems

ReLU (T (v u) + b) = ReLU( Tv + b), where u = 0, that is, ReLU (T + b) is not injective. By injectivity of T, we finally get a = b . Remark 2. An example that satisfies (3.1) is the neural operator whose This construction is given by the combination of "Pairs of projections" discussed in Kato [2013, Section I.4.6] with the idea presented in [Puthawala et al., 2022b, Lemma 29]. R. We write operator null G by Thus, in both cases, H is injective. Remark 4. W e make the following observations using Theorem 1: Leaky ReLU is one of example that satisfies (ii) in Theorem 1. Puthawala et al. [2022a, Theorem 15] assumes that We first revisit layerwise injectivity and bijectivity in the case of the finite rank approximation.


AT echnical Proofs Proof of Proposition 4.1.. Using the chain rule, (1), and the definitions of null

Neural Information Processing Systems

This appendix presents the technical details of efficiently implementing Algorithm 2. B.1 Computing Intermediate Quantities We argue that in the setting of neural networks, Algorithm 2 can obtain the intermediate quantities ζ Algorithm 3 gives a subroutine for computing the necessary scalars used in the efficient squared norm function of the embedding layer.Algorithm 3 Computing the Nonzero V alues of n In the former case, it is straightforward to see that we incur a compute (resp. F .1 Effect of Batch Size on Fully-Connected Layers Figure 4 presents numerical results for the same set of experiments as in Subsection 5.1 but for different batch sizes |B | instead of the output dimension q . Similar to Subsection 5.1, the results in Figure 4 are more favorable towards Adjoint compared to GhostClip.