Appendix for Model based Policy Optimization with Unsupervised Model Adaptation A Omitted Proofs
–Neural Information Processing Systems
Besides Wasserstein distance, we can use other distribution divergence metrics to align the features. MMD is another instance of IPM when the witness function class is the unit ball in a reproducing kernel Hilbert space (RKHS). The results on three environments are shown in Figure 5. We show the one-step model losses during the experiments in the other four environments in Figure D.5. We find that the conclusion in Section 5.2 still holds in these four environments.
Neural Information Processing Systems
Nov-13-2025, 11:46:18 GMT
- Technology: