Appendices

Neural Information Processing Systems 

Appendix A provides derivations supporting Section 3 in the main paper. In Appendix B, we explain our experimental setup, including dataset preparation and model implementation, in more detail. Finally, Appendix C provides additional results supporting our claims regarding the scalability of our method, together with additional results from the experiments presented in Section 4. In this section we provide detailed derivations of the ST-DGMRF joint distribution, for both firstorder transition models (Section A.1) and higher-order transition models (Section A.2). A.1 Joint distribution The LDS (see Section 2.2 and 3.1 in the main paper) defines a joint distribution over system states First, note that Eq. (1) can be written as a set of linear equations Moving all xk-terms to the left-hand side, we can rewrite this as a matrix-vector multiplication I F1 I F2 I ...... FKI | {z} Empty positions in F represent zero-blocks. Now, we can express x as an affine transformation of ϵ x = F 1c+F 1ϵ, (3) where F 1 exists because det(F) = 1. Since ϵ is distributed as ϵ N(0,Q 1) with Q = diag(Q0,Q1,...,QK), and c is deterministic, we can use the affine property of Gaussian distributions to obtain the joint distribution This reduces both computations and memory requirements. In contrast, the information vector η = Ωµcan be expressed compactly as η = FTQFF 1c = FTQc, (8) which can be computed efficiently using sparse and parallel matrix-vector multiplications on a GPU.

Duplicate Docs Excel Report

Similar Docs  Excel Report  more

TitleSimilaritySource
None found