direct matching
Scalable Inference in SDEs by Direct Matching of the Fokker-Planck-Kolmogorov Equation
This supplementary document is organized as follows. We provide details in terms of the concept of'solution' to an SDE, how we use a finite-differences As illustrated in Figure 1 in the main paper, the concept of a'solution' to an SDE is broader than that of This is what is done in this paper. We can now interpret Eq. (7) through these finite difference The model which we call a'GP-SDE' model in the main paper has appeared in various forms in literature before. It directly resembles a'random' ODE model, where the random field Figure 1 in the main paper, just providing further examples from the test set. For the timing experiments in Sec. 3, we constructed a setup that allowed us to control the approximation error.
Scalable Inference in SDEs by Direct Matching of the Fokker–Planck–Kolmogorov Equation
Simulation-based techniques such as variants of stochastic Runge-Kutta are the de facto approach for inference with stochastic differential equations (SDEs) in machine learning. These methods are general-purpose and used with parametric and non-parametric models, and neural SDEs. Stochastic Runge-Kutta relies on the use of sampling schemes that can be inefficient in high dimensions. We address this issue by revisiting the classical SDE literature and derive direct approximations to the (typically intractable) Fokker-Planck-Kolmogorov equation by matching moments. We show how this workflow is fast, scales to high-dimensional latent spaces, and is applicable to scarce-data applications, where a non-parametric SDE with a driving Gaussian process velocity field specifies the model.
- North America > United States > Maryland > Baltimore (0.04)
- North America > Canada > Quebec > Montreal (0.04)
Scalable Inference in SDEs by Direct Matching of the Fokker-Planck-Kolmogorov Equation
This supplementary document is organized as follows. We provide details in terms of the concept of'solution' to an SDE, how we use a finite-differences As illustrated in Figure 1 in the main paper, the concept of a'solution' to an SDE is broader than that of This is what is done in this paper. We can now interpret Eq. (7) through these finite difference The model which we call a'GP-SDE' model in the main paper has appeared in various forms in literature before. It directly resembles a'random' ODE model, where the random field Figure 1 in the main paper, just providing further examples from the test set. For the timing experiments in Sec. 3, we constructed a setup that allowed us to control the approximation error.
How not to Stitch Representations to Measure Similarity: Task Loss Matching versus Direct Matching
Balogh, András, Jelasity, Márk
Measuring the similarity of the internal representations of deep neural networks is an important and challenging problem. Model stitching has been proposed as a possible approach, where two half-networks are connected by mapping the output of the first half-network to the input of the second one. The representations are considered functionally similar if the resulting stitched network achieves good task-specific performance. The mapping is normally created by training an affine stitching layer on the task at hand while freezing the two half-networks, a method called task loss matching. Here, we argue that task loss matching may be very misleading as a similarity index. For example, it can indicate very high similarity between very distant layers, whose representations are known to have different functional properties. Moreover, it can indicate very distant layers to be more similar than architecturally corresponding layers. Even more surprisingly, when comparing layers within the same network, task loss matching often indicates that some layers are more similar to a layer than itself. We argue that the main reason behind these problems is that task loss matching tends to create out-of-distribution representations to improve task-specific performance. We demonstrate that direct matching (when the mapping minimizes the distance between the stitched representations) does not suffer from these problems. We compare task loss matching, direct matching, and well-known similarity indices such as CCA and CKA. We conclude that direct matching strikes a good balance between the structural and functional requirements for a good similarity index.
- North America > Canada > Ontario > Toronto (0.14)
- Europe > Hungary > Csongrád-Csanád County > Szeged (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- (2 more...)
Scalable Inference in SDEs by Direct Matching of the Fokker–Planck–Kolmogorov Equation
Simulation-based techniques such as variants of stochastic Runge–Kutta are the de facto approach for inference with stochastic differential equations (SDEs) in machine learning. These methods are general-purpose and used with parametric and non-parametric models, and neural SDEs. Stochastic Runge–Kutta relies on the use of sampling schemes that can be inefficient in high dimensions. We address this issue by revisiting the classical SDE literature and derive direct approximations to the (typically intractable) Fokker–Planck–Kolmogorov equation by matching moments. We show how this workflow is fast, scales to high-dimensional latent spaces, and is applicable to scarce-data applications, where a non-parametric SDE with a driving Gaussian process velocity field specifies the model.