Goto

Collaborating Authors

 hyper-parameter


A.1 ConjugateDerivations Cross-EntropyLoss: L(h,y) = cX

Neural Information Processing Systems

Thelossesarecompared onthreedegreesofshift(easy,moderate and hard), which is controlled by the drifted distance of Gaussian clusters. Herewediscuss the architecture chosen and the implementation details. Note that the task loss / surrogate loss function is used to update the meta-loss mϕ during meta-learning. The number of transformer layers and the hidden layers in MLP are selected from{1,2}. Wecanseethatthetask loss barely affects the learnt meta loss.


A Proof Proof of Proposition 4.2 Proposition 4.2 The performance gap of evaluating policy profile (π, µ) and (π, π

Neural Information Processing Systems

Proof of Theorem 4.7 We first prove a Lemma. Theorem A.2. (Theorem 1 in [36]) Let ϵ = max Theorem 4.7 In a two-player game, suppose that According to Theorem A.2, we have J ( π, µ) J ( π, α) E CQL [20] puts regularization on the learning of Q function to penalize out-of-distribution actions. The CSP algorithm is illustrated in Algorithm 1. The proxy model is trained adversarially against our agent, therefore, we set the proxy's reward function to be the negative of our agent's reward. We show experiment details of the Maze example in this section.



Supplementary Material A Experimentation Details

Neural Information Processing Systems

A.1 Source code Upon request, we will provide an anonymized version of our code in the rebuttal. We replicated our experiments using the codebase provided by Shah et al. [ 2022 ], which can be found at github . To ensure consistency, we used the same hyperparameters as mentioned in the code or article for the baselines. This helps ensure the stability of metric learning. We initialize the parameters in such a way that the predicted metric is close to the Euclidean metric.