Appendix of Decomposed Mutual Information Optimization for Generalized Context in Meta-Reinforcement Learning Y ao Mu The University of Hong Kong
–Neural Information Processing Systems
Ping Luo is the corresponding author. With Equation 3 and Jensen's inequality applied in Equation 1, we have I (x,y) E Therefore, if the number of confounders increases, then the demand for data will grow exponentially. When data is not rich enough, the nesseray condition may not be satisfied. We provide the pseudo-code of DOMINO combined with model-based methods. Firstly, the past state-action pairs are encoded into the disentangled context vectors by the context encoder. Initialize batch B . for i = 1 to B do sample V Listing 1: PyTorch-style pseudo-code for dynamics change based on Mujoco engine.
Neural Information Processing Systems
Aug-17-2025, 19:37:58 GMT